[jira] Created: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls

Andrzej Bialecki (JIRA) Wed, 15 Sep 2010 08:00:58 -0700

DataStore API doesn't support multiple storage areas for multiple disjoint 
crawls
---------------------------------------------------------------------------------


                 Key: NUTCH-907
                 URL: https://issues.apache.org/jira/browse/NUTCH-907
             Project: Nutch
          Issue Type: Bug
            Reporter: Andrzej Bialecki 
             Fix For: 2.0


In Nutch 1.x it was possible to easily select a set of crawl data (crawldb, 
page data, linkdb, etc) by specifying a path where the data was stored. This 
enabled users to run several disjoint crawls with different configs, but still 
using the same storage medium, just under different paths.

This is not possible now because there is a 1:1 mapping between a specific 
DataStore instance and a set of crawl data.

In order to support this functionality the Gora API should be extended so that 
it can create stores (and data tables in the underlying storage) that use 
arbitrary prefixes to identify the particular crawl dataset. Then the Nutch API 
should be extended to allow passing this "crawlId" value to select one of 
possibly many existing crawl datasets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (NUTCH-907) DataStore API doesn't support multiple storage areas for multiple disjoint crawls

Reply via email to