Hello - I asked this on nutch-user but didn't get a response. I am using nutch-0.8. I would like to fetch a few segments each night, then update one large index. Is it safe to run index on a group of segments, then run index again on a different group of segments, then merge? I haven't found where this procedure is documented. I would like to do something like this:
assume I have four segments - I'll label them s0 s1 s2 s3 instead of their timestamp names. The first night I would index s0, s1 and rename the index to "A": nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/s0 crawl/segments/s1 mv crawl/indexes/part-00000 crawl/indexes/A Then on the second night I would index s2, s3 and rename the index to "B": nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/s2 crawl/segments/s3 mv crawl/indexes/part-00000 crawl/indexes/B Finally I would merge the two: nutch merge crawl/index crawl/indexes Is this safe to do? Is this how you're supposed to crawl nightly? Any docs I'm missing on this? Again, this is all for nutch-0.8, so some of the docs from 0.7 no longer apply. Thank you -- Derek Young
