If you are going to be able to reconfigure a nutch component at runtime, you
need to remove any configuration from the constructor and have a method that
allows you to get/set the configuration for the component. The problem with
keeping the entire configuration in a single component is trying to
display/filter the configuration information for the user. So the user knows
what component it is configuring.

Eclipse has a very good pattern for handling configuration for each of the
components. Basically each component is responsible for its own
configuration, and the tool just provides the framework to allow the
configuration to be displayed, updated, and stored.

The drawback of that approach is that you really don't have a GUI, or at
least have to be able to run without one.

I think that, at the very least, removing the configuration information from
the constructor is the first step.  You can still have a properties object
set the configuration. Then we can discuss the relative merits of
displaying, changing, and storing the configuration.  (Like, how a user is
supposed to know what component is affected by which property.)

Thanks,

Steve Betts
[EMAIL PROTECTED]
937-477-1797


-----Original Message-----
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 04, 2006 12:22 PM
To: [email protected]
Subject: Re: no static NutchConf

>
> I don't fully agree with this. In most such cases, you already have
> a NutchConf instance in the method or class context, so it makes
> sense to use it in the constructor. You could add these construtors
> with all parameters iterated, but I'd expect that the constructors
> using NutchConf would be used most frequently.

My  idea is to be able using low level things outside of nutch also.
It is may a philosophically question in case of the map file writer
you pass a complete hashmap with a bunch of properties to the object,
but the objects only reads one int from this hashmap. I personal
don't like to use a hashmap to 'transport' just one value.

So my suggestion looks like:
new MapFile.Reader(parameterA, nutchConf.getInt("parameterKey", 0));
if I understand you correct you prefer:
new MapFile.Reader(parameterA, nutchConf);
...
public MapFile(...){
        this.parameter = nutchConf.getInt("parameterKey",0);
}

As mentioned this is more a code philosophy question and this is not
important for me, my only idea was to decouple things as much as
possible if we touch it anyway.

>> + Getting a Extension, require also a NutchConf that is injected
>> in  case the Extension Object (e.g. a Parser) implements a
>> Configurable  interface.
>
>
> Yes. If you remember our discussion, I'd like also to follow a
> pattern where such instances are cached inside this NutchConf
> instance, if appropriate (i.e. if they are reusable and multi-
> threaded).


I'm afraid I still do not clearly understand your idea here. As
discussed it makes from my point of view no sense to cache any
objects in a nutchConf.
Especially extension implementation like parsers are multithreaded
and exists that often as we have threads. A caching would make more
sense behind the sense of the plugin registry, but it is may
difficult since you can run in trouble with resource life cycle
management. PluginClass instances are already cached and working like
a kind of singleton for each existing plugin registry.
Also I see some trouble  when using this caching mechanism since
NutchConf can be serialized. Actually I have no idea where this
mechanism is used, but I guess distributed map reduce will use this
mechanism heavily.
So the cached objects need to be Serializable as well.

Stefan


Reply via email to