Jérôme Charron wrote:
Andrzej,
How do you choose the NutchConf to use ?
It is provided as an argument to all constructors.
Here is a short discussion I had with Doug about a kind of dynamic NutchConf
inside the same JVM:
"... By looking at the mailing lists archives it seems that having some
behavior depending on the documents URL is a recurrent problem (for instance
for boosting documents matching a url pattern - NUTCH-16 issue, and many
other topics).
So, our idea is to provide a way to provide a "dynamic" nutch configuration
(that override the default one, like for the nutch-site) based on documents
matching urls pattern. The idea is as follow:
Well, it's a neat idea, but it's not necessarily what I was proposing.
My proposal could be the first step to implement this.
1. The default configuration is as usualy the nutch-default.xml file
2. An xml file can map some url regexp to some many others configurations
files (that override the nutch-default):
<nutch:conf>
<url regexp="http://www.mydomain1.com/*">
<!-- A set of nutch properties that override the nutch-default for this
domain -->
<property>
<name>property1</name>
<value>value1</name>
</property>
....
</url>
....
</nutch:conf>"
What do you think about this?
Yes, if you can specify different configs for every run, or even for
every invocation, it's certainly possible.
Looking deeper, this is more messy that I thought... Some changes would
be required to the plugin instantiation mechanisms, e.g.:
Extension.getExtensionInstance() -> getExtensionInstance(NutchConf)
ExtensionPoint.getExtensions() -> getExtensions(NutchConf)
PluginRepository.getExtensionPoint(String) ->
getExtensionPoint(String, NutchConf)
etc, etc...
The way this would work would be similar to the mechanism described
above: if plugin instances are not created yet, they would be created
once (based on the current NutchConf argument), and then cached in this
NutchConf instance.
And also the plugin implementations would have to extend
NutchConfigured, taking NutchConf as the argument to their constructors
- because now the Extension.getExtensionInstance would pass the current
NutchConf instance to their contructors.
That's exactly what I had in mind while speaking about a dynamic NutchConf
with Doug.
For me it's a +1
The only think I don't really like is extending the NutchConfigured, but it
is the most secured way to implement it.
Well, it's a form of enforcing a contract for the constructors. There is
no other way to do it in Java - you can't specify the required
constructors in an interface. OTOH you have the NutchConfigurable
interface, which we could use instead, but then you have to remember to
call setConf() before you do anything else...
I'll work on this to see where it leads.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com