crawlId not supported by all Tools
----------------------------------

                 Key: NUTCH-1290
                 URL: https://issues.apache.org/jira/browse/NUTCH-1290
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: nutchgora
            Reporter: Mathijs Homminga
            Priority: Minor
             Fix For: nutchgora


See also: https://issues.apache.org/jira/browse/NUTCH-907

The StorageUtils class exposes a createDataStore method which uses the default 
schema for a persistent class specified in the Gora configuration. 
This method ignores Nutch' storage.schema property and the notion of a crawlId.

Two tools use this method instead of the createWebStore method (which does 
support the storage.schema property and a crawlId):

o.a.n.indexer.IndexerReducer (IndexerJob)
o.a.n.util.domain.DomainStatistics
 
I propose that these two start using the createWebStore method and that we make 
remove the createDataStore method from the StorageUtils.
Also, these two tools should support the crawlId command line parameter.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to