DataStore API doesn't support multiple storage areas for multiple disjoint
crawls
---------------------------------------------------------------------------------
Key: NUTCH-907
URL: https://issues.apache.org/jira/browse/NUTCH-907
Project: Nutch
Issue Type: Bug
Reporter: Andrzej Bialecki
Fix For: 2.0
In Nutch 1.x it was possible to easily select a set of crawl data (crawldb,
page data, linkdb, etc) by specifying a path where the data was stored. This
enabled users to run several disjoint crawls with different configs, but still
using the same storage medium, just under different paths.
This is not possible now because there is a 1:1 mapping between a specific
DataStore instance and a set of crawl data.
In order to support this functionality the Gora API should be extended so that
it can create stores (and data tables in the underlying storage) that use
arbitrary prefixes to identify the particular crawl dataset. Then the Nutch API
should be extended to allow passing this "crawlId" value to select one of
possibly many existing crawl datasets.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.