This is something I usually do:- $NUTCH_HOME/bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/* rm -rf crawl/segments/* mv crawl/MERGEDsegments/* crawl/segments
You might want to replace the second statement with a 'mv' statement to backup the segments. Regards, Susam Pal http://susam.in/ On 6/21/07, Phạm Hải Thanh <[EMAIL PROTECTED]> wrote:
Hi all, After recrawl several times, I have problem with the directory: merge-output. I have digged into mail archive and found some clue: you should use a new dir name for the new merge, e.g., merge-output_new, then mv merge-output_new to merge-output. Anyone can show me exactly how to do this ? Thanks a lot ============================================================================ After refetching database during index merging I get following error. 2007-04-27 15:58:37,787 FATAL indexer.IndexMerger - IndexMerger: java.io.IOException: Target /usr/local/nutch/nutchdb/index/merge-output already exists at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:230) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:70) at org.apache.hadoop.fs.LocalFileSystem.copyFromLocalFile(LocalFileSystem.java:49) at org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:750) at org.apache.hadoop.fs.ChecksumFileSystem.completeLocalOutput(ChecksumFileSystem.java:622) at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java:104) at org.apache.nutch.indexer.IndexMerger.run(IndexMerger.java:150) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java:113)