[ https://issues.apache.org/jira/browse/NUTCH-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki reassigned NUTCH-907: --------------------------------------- Assignee: Andrzej Bialecki > DataStore API doesn't support multiple storage areas for multiple disjoint > crawls > --------------------------------------------------------------------------------- > > Key: NUTCH-907 > URL: https://issues.apache.org/jira/browse/NUTCH-907 > Project: Nutch > Issue Type: Bug > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Fix For: 2.0 > > Attachments: NUTCH-907.patch, NUTCH-907.v2.patch > > > In Nutch 1.x it was possible to easily select a set of crawl data (crawldb, > page data, linkdb, etc) by specifying a path where the data was stored. This > enabled users to run several disjoint crawls with different configs, but > still using the same storage medium, just under different paths. > This is not possible now because there is a 1:1 mapping between a specific > DataStore instance and a set of crawl data. > In order to support this functionality the Gora API should be extended so > that it can create stores (and data tables in the underlying storage) that > use arbitrary prefixes to identify the particular crawl dataset. Then the > Nutch API should be extended to allow passing this "crawlId" value to select > one of possibly many existing crawl datasets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.