Andrzej,
well I'm not ready with digging into the problem but want to ask some
more questions.
BTW I counted 195 places that use NutchConf.get(), so this will be a
bigger patch. :)
As I mentioned I would love to go the inversion of control way, so
not using nutchConf in the constructor but make classes implementing
the Configurable interface. This for example would be sensefully for
all classed realizing a extension.
But there are also classes where this makes no sense. For example I
would suggest to change the PluginRegestry from a singleton to a
'normal' object, in this case I guess it make sense to use the
nutchConf in the constructor, since the configuration here only need
to know the
include and exclude regex for the plugins.
So:
Extension.getExtensionInstance() -> getExtensionInstance(NutchConf)
This makes sense, here we can check if the class implements the
configurable interface and if so instantiate the object and set the
configuration.
ExtensionPoint.getExtensions() -> getExtensions(NutchConf)
We don't need NutchConf here since if I understand it correct this is
only needed to identify the activated plugins and this is done until
regestry instantiation that in this case take a NutchConf as parameter.
PluginRepository.getExtensionPoint(String) -> getExtensionPoint
(String, NutchConf)
We don't need it here as well, since we use NutchConf until regestry
instantiation.
The other case would be that we have to build up the plugin
dependency graph for each method call.
Would you agree to have a several plugin regestries with may be
different NutchConf's but instantiate extensions with nutchConf but
not query ExtentsionPoints etc?
etc, etc...
The way this would work would be similar to the mechanism described
above: if plugin instances are not created yet, they would be
created once (based on the current NutchConf argument), and then
cached in this NutchConf instance.
I guess this is difficult.
First we have the plugin class instances, most or may all plugins I
know do not have a plugin class implementation, second we have the
extensions classes that at least do not need to implement a specific
interface from the plugin regestry point of view (only such things
like index filter interface etc.)
Caching plugin class instances makes sense since actually there is
only one plugin class instance per plugin in the jvm. However there
will be many instances for each extension class, since e.g. the
parser or protocoll runs multithreaded.
And also the plugin implementations would have to extend
NutchConfigured, taking NutchConf as the argument to their
constructors - because now the Extension.getExtensionInstance would
pass the current NutchConf instance to their contructors.
In general my point of view is that:
In case we touch this issue anyway I would love to do a radical
solution, since i have a other understanding of handling parameters
than collect them in a kind of map and make the map general accessible.
Instead of giving any object access to the configuration object and
handle properties like a bazar I would prefer handle configuration
only in the first object in the stack, that would be in our case for
example the indexing tool.
Than the indexing tool instantiate the plugin registry only with the
required properties that would be part of the constructor, e.g.
pluginFolders, include, exlude reg ex and autoactivation flag.
Later the extension instances can be also get some more values
injected, but has in general no access to the configuration object.
This would first of all make things better testable but also allows
much much more flexibility to run several different fetchers in one
jvm.
Anyway this would be may a imporvement suggestion from me for nutch
2.0 or 3.0 for now we would be some steps forward just changing
NutchConf access to non static style.
I hopefully found some time until next days to do some experiments
and will come back with some more details.
However we should found a general agreement about the way we go,
since changing code in 195 places and lines that depends on it for
just nothing is not that funny.
Stefan