nutch-user  

1 - 10 of 871 matches



mergesegs disk space

2009/07/15 Hi, I'm trying to merge (using nutch-1.0 mergesegs) about 1.2MM pages on one machine contained in 10 segments, using: bin/nutch mergesegs crawl/merge_seg -dir crawl/segments ,but there is not enough space on 500G disk to complete this merge task (getting java.io.IOException: No space left -- Tomislav Poljak

how could I merge tow index together?

2006/09/08 I fetch tow site, and index seperately. How could I merge them ? Thank YOU! -- heack

Re: what is indexes in merge-output folder?

2008/12/17 It seems merge-output is generated by: $nutch_dir/nutch merge $index_dir $new_indexes ianwong wrote: Hello mate, I tried recrawl.sh to do a recrawl job. It changed nothing to indexes in the index folder , but I saw some new indexes were generated in the -- ianwong

nutch merge

2005/09/08 when I merge the index where do I put it? does it still need to be in the segments folder? I've merged it, and tried to start tomcat from that directory without luck, will return blank page after searching? Thanks, -J -- Jay Pound

Is it at all necessary to merge segments in MapRed?

2005/09/28 Well, I was not able to find any info about that... Is it at all necessary to merge segments located in the ndfs and if it is, How? Thanks, Gal -- Gal Nitzan

what is indexes in merge-output folder?

2008/12/17 Hello mate, I tried recrawl.sh to do a recrawl job. It changed nothing to indexes in the index folder , but I saw some new indexes were generated in the merge-output folder under index folder. I want to know what is indexes in merge-out folder? Is it incremental or completed -- ianwong

content of hadoop-site.xml

2009/08/26 Hello, ?I have run merge script? to merge two crawl dirs, one 1.6G another 120MB. But my MacPro with 50G free space did not start, after merge crashed with no space error. I have been told that OSX got corrupted. I looked inside my nutch-1.0/conf/hadoop-site.xml file and it is empty. Can -- alxsss

Merge Crawls nutch - 0.7.2

2007/03/06 Hi all! In mergecrawl script which is provided in nutch wiki I found that nutch 0.8.x has additional mergelinkdb option, but 0.7.x has only merge and mergesegs. Is there any way to merge links of two db-s in nutch 0.7.2? Or is it necessery? Thanks. -- Nuther -- Nuther

Re: what is indexes in merge-output folder?

2008/12/17 When I try to merge indexes, I got follows: IndexMerger: java.io.IOException: Target c22/index/merge-output already exists at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:230) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:70 -- ianwong

What is the best choice: nutch/lucene or nutch/solr?

2009/12/04 I am going over mailing list and still didn't find an answer. For a project, I need to crawl the web, index it and merge that content with another site's content which is stored inside the key-value storage system. What is the best approach to merge these two contents in to a lucene index -- Mr Hadoop

  1   2   3   4   5   6   7   8   9   10   >