I don't fully agree with this. In most such cases, you already have a NutchConf instance in the method or class context, so it makes sense to use it in the constructor. You could add these construtors with all parameters iterated, but I'd expect that the constructors using NutchConf would be used most frequently.

My idea is to be able using low level things outside of nutch also. It is may a philosophically question in case of the map file writer you pass a complete hashmap with a bunch of properties to the object, but the objects only reads one int from this hashmap. I personal don't like to use a hashmap to 'transport' just one value.

So my suggestion looks like:
new MapFile.Reader(parameterA, nutchConf.getInt("parameterKey", 0));
if I understand you correct you prefer:
new MapFile.Reader(parameterA, nutchConf);
...
public MapFile(...){
        this.parameter = nutchConf.getInt("parameterKey",0);
}

As mentioned this is more a code philosophy question and this is not important for me, my only idea was to decouple things as much as possible if we touch it anyway.

+ Getting a Extension, require also a NutchConf that is injected in case the Extension Object (e.g. a Parser) implements a Configurable interface.


Yes. If you remember our discussion, I'd like also to follow a pattern where such instances are cached inside this NutchConf instance, if appropriate (i.e. if they are reusable and multi- threaded).


I'm afraid I still do not clearly understand your idea here. As discussed it makes from my point of view no sense to cache any objects in a nutchConf. Especially extension implementation like parsers are multithreaded and exists that often as we have threads. A caching would make more sense behind the sense of the plugin registry, but it is may difficult since you can run in trouble with resource life cycle management. PluginClass instances are already cached and working like a kind of singleton for each existing plugin registry. Also I see some trouble when using this caching mechanism since NutchConf can be serialized. Actually I have no idea where this mechanism is used, but I guess distributed map reduce will use this mechanism heavily.
So the cached objects need to be Serializable as well.

Stefan

Reply via email to