[
https://issues.apache.org/jira/browse/NUTCH-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-844:
------------------------------------
Attachment: NUTCH-844.patch
Updated patch. This also addresses an issue in PluginRepository that uses
Configuration as a key in its internal cache - the problem though is that
Configuration doesn't implement hashCode, so the cache would have been
ineffective in situations like this:
{code}
Configuration conf = NutchConfiguration.create();
PluginRepository repo1 = PluginRepository.get(conf);
JobConf job = new NutchJob(conf);
PluginRepository repo2 = PluginRepository.get(job);
// repo2 is a new instance, but should be the same instance!
{code}
The new code sets a UUID property, so the cache knows it's still the same
instance. There's a new unit test to ensure this works properly when using
NutchConfiguration.create(), and illustrates that it fails without the uuid.
> Improve NutchConfiguration
> --------------------------
>
> Key: NUTCH-844
> URL: https://issues.apache.org/jira/browse/NUTCH-844
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 2.0
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Fix For: 2.0
>
> Attachments: conf.patch, NUTCH-844.patch
>
>
> This patch cleans up NutchConfiguration from servlet dependency, and modifies
> the API to allow bootstrapping via API from Properties. This is important for
> use cases where Nutch is embedded in a larger application.
> Also, while I'm at it, remove the support for alternative "crawl"
> configuration when running Crawl tool, which has always been a source of
> confusion.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.