This is something I usually do:-

$NUTCH_HOME/bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/*
rm -rf crawl/segments/*
mv crawl/MERGEDsegments/* crawl/segments

You might want to replace the second statement with a 'mv' statement
to backup the segments.

Regards,
Susam Pal
http://susam.in/

On 6/21/07, Phạm Hải Thanh <[EMAIL PROTECTED]> wrote:
Hi all,

After recrawl several times, I have problem with the directory: merge-output. I 
have digged into mail archive and found some clue: you should use a new dir 
name for the new merge, e.g., merge-output_new, then mv merge-output_new to 
merge-output.



Anyone can show me exactly how to do this ?

Thanks a lot



============================================================================

After refetching database during index merging I get following error.



2007-04-27 15:58:37,787 FATAL indexer.IndexMerger - IndexMerger:

java.io.IOException: Target /usr/local/nutch/nutchdb/index/merge-output already

exists

        at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:230)

        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:70)

        at

org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:49)

        at

org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:750)

        at

org.apache.hadoop.fs.ChecksumFileSystem.completeLocalOutput(ChecksumFileSystem.java:622)

        at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:104)

        at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150)

        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)

        at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113)


Reply via email to