1 - 10 of 871 matches
2009/07/15
Hi,
I'm trying to merge (using nutch-1.0 mergesegs) about 1.2MM pages on one
machine contained in 10 segments, using:
bin/nutch mergesegs crawl/merge_seg -dir crawl/segments
,but there is not enough space on 500G disk to complete this merge task
(getting java.io.IOException: No space left -- Tomislav Poljak
2006/09/08
I fetch tow site, and index seperately. How could I merge them ?
Thank YOU!
-- heack
2008/12/17
It seems merge-output is generated by:
$nutch_dir/nutch merge $index_dir $new_indexes
ianwong wrote:
Hello mate,
I tried recrawl.sh to do a recrawl job. It changed nothing to indexes in
the index folder , but I saw some new indexes were generated in the
-- ianwong
2005/09/08
when I merge the index where do I put it? does it still need to be in the
segments folder? I've merged it, and tried to start tomcat from that directory
without luck, will return blank page after searching?
Thanks,
-J
-- Jay Pound
2005/09/28
Well,
I was not able to find any info about that...
Is it at all necessary to merge segments located in the ndfs and if it
is, How?
Thanks, Gal
-- Gal Nitzan
2008/12/17
Hello mate,
I tried recrawl.sh to do a recrawl job. It changed nothing to indexes in the
index folder , but I saw some new indexes were generated in the merge-output
folder under index folder.
I want to know what is indexes in merge-out folder? Is it incremental or
completed -- ianwong
2009/08/26
Hello,
?I have run merge script? to merge two crawl dirs, one 1.6G another 120MB. But
my MacPro with 50G free space did not start, after merge crashed with no space
error. I have been told that OSX got corrupted.
I looked inside my nutch-1.0/conf/hadoop-site.xml file and it is empty. Can -- alxsss
2007/03/06
Hi all!
In mergecrawl script which is provided in nutch wiki I found that
nutch 0.8.x has additional mergelinkdb option, but 0.7.x has only merge and
mergesegs.
Is there any way to merge links of two db-s in nutch 0.7.2? Or is it necessery?
Thanks.
--
Nuther -- Nuther
2008/12/17
When I try to merge indexes, I got follows:
IndexMerger: java.io.IOException: Target c22/index/merge-output already
exists
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:230)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:70 -- ianwong
2009/12/04
I am going over mailing list and still didn't find an answer.
For a project, I need to crawl the web, index it and merge that content with
another site's content which is stored inside the key-value storage system.
What is the best approach to merge these two contents in to a lucene index -- Mr Hadoop