Andrzej,
well I'm not ready with digging into the problem but want to ask some more questions. BTW I counted 195 places that use NutchConf.get(), so this will be a bigger patch. :)

As I mentioned I would love to go the inversion of control way, so not using nutchConf in the constructor but make classes implementing the Configurable interface. This for example would be sensefully for all classed realizing a extension. But there are also classes where this makes no sense. For example I would suggest to change the PluginRegestry from a singleton to a 'normal' object, in this case I guess it make sense to use the nutchConf in the constructor, since the configuration here only need to know the
include and exclude regex for the plugins.
So:
   Extension.getExtensionInstance() -> getExtensionInstance(NutchConf)
This makes sense, here we can check if the class implements the configurable interface and if so instantiate the object and set the configuration.

   ExtensionPoint.getExtensions() -> getExtensions(NutchConf)
We don't need NutchConf here since if I understand it correct this is only needed to identify the activated plugins and this is done until regestry instantiation that in this case take a NutchConf as parameter.

PluginRepository.getExtensionPoint(String) -> getExtensionPoint (String, NutchConf)
We don't need it here as well, since we use NutchConf until regestry instantiation. The other case would be that we have to build up the plugin dependency graph for each method call. Would you agree to have a several plugin regestries with may be different NutchConf's but instantiate extensions with nutchConf but not query ExtentsionPoints etc?

etc, etc...

The way this would work would be similar to the mechanism described above: if plugin instances are not created yet, they would be created once (based on the current NutchConf argument), and then cached in this NutchConf instance.
I guess this is difficult.
First we have the plugin class instances, most or may all plugins I know do not have a plugin class implementation, second we have the extensions classes that at least do not need to implement a specific interface from the plugin regestry point of view (only such things like index filter interface etc.) Caching plugin class instances makes sense since actually there is only one plugin class instance per plugin in the jvm. However there will be many instances for each extension class, since e.g. the parser or protocoll runs multithreaded.


And also the plugin implementations would have to extend NutchConfigured, taking NutchConf as the argument to their constructors - because now the Extension.getExtensionInstance would pass the current NutchConf instance to their contructors.

In general my point of view is that:
In case we touch this issue anyway I would love to do a radical solution, since i have a other understanding of handling parameters than collect them in a kind of map and make the map general accessible. Instead of giving any object access to the configuration object and handle properties like a bazar I would prefer handle configuration only in the first object in the stack, that would be in our case for example the indexing tool. Than the indexing tool instantiate the plugin registry only with the required properties that would be part of the constructor, e.g. pluginFolders, include, exlude reg ex and autoactivation flag. Later the extension instances can be also get some more values injected, but has in general no access to the configuration object. This would first of all make things better testable but also allows much much more flexibility to run several different fetchers in one jvm. Anyway this would be may a imporvement suggestion from me for nutch 2.0 or 3.0 for now we would be some steps forward just changing NutchConf access to non static style.


I hopefully found some time until next days to do some experiments and will come back with some more details.

However we should found a general agreement about the way we go, since changing code in 195 places and lines that depends on it for just nothing is not that funny.

Stefan


Reply via email to