Re: Merging CrawlDBs

2023-02-02 Thread Kamil Mroczek
Thanks for the info Sebastian. Re: Why do you want to merge the data structures? To help inform my crawl strategy I am trying to see what is possible and it feels like having the ability to run concurrent crawls might get around any limitations in the software. I am currently seeding a set of

Re: Merging CrawlDBs

2023-02-02 Thread Sebastian Nagel
Hi Kamil, > I was wondering if this script is advisable to use? I haven't tried the script itself but some of the underlying commands - mergedb, etc. > merge command ($nutch_dir/nutch merge $index_dir $new_indexes) Of course, some of the commands are obsolete. Long time ago, Nutch used Lucene

Merging CrawlDBs

2023-02-01 Thread Kamil Mroczek
Hi, I am testing how merging crawls works and found this script https://cwiki.apache.org/confluence/display/NUTCH/MergeCrawl. I was wondering if this script is advisable to use? I plan to use it for crawls of non-overlapping urls. I am wary of using it since it is located under "Archive &

Re: Merging crawldbs and linkdbs during incremental crawl

2012-06-12 Thread Ali Safdar Kureishy
Hi, Just checking if anyone could comment on my post below. :) Thanks in advance. Safdar On Mon, Jun 11, 2012 at 8:10 AM, Ali Safdar Kureishy safdar.kurei...@gmail.com wrote: Hi, I'm trying to build an incremental crawler, using the various Nutch crawl tools (generate + fetch/parse +

Merging crawldbs and linkdbs during incremental crawl

2012-06-10 Thread Ali Safdar Kureishy
Hi, I'm trying to build an incremental crawler, using the various Nutch crawl tools (generate + fetch/parse + updatedb etc.). By incremental I mean I want crawled pages to show up quickly in the index (instead of waiting till the end of the crawl). So, I'd like to index as soon as I have fetched