Jérôme Charron wrote:

Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?
Running many different tasks in parallel, each using different config,
inside the same JVM.

Ok, I understand this Andrzej, but it is not really what I call a use case.
It is more a feature that you describe here.
In fact, what I mean is that I don't understand in which cases it will be
usefull. And I don't understand how a particular
NutchConfig will be selected for a particular task...

Use case: executing multiple tasks on any single tasktracker node, but with drastically different configurations per each task.

Example: what happens now if you try to run more than one fetcher at the same time, where the fetcher parameters differ (or a set of activated plugins differs)? You can't - the local tasks on each tasktracker will use whatever local config is there. What happens if you change the config on a node that submits the job? The changes won't be propagated to the tasktracker nodes, because tasktrackers use local configuration (through a singleton NutchConf.get()), instead of supplying a serialized/deserialized instance of the config from the originating node... etc.

NutchConf instances will be created when you create a JobConf. Then they will have to be serialized/deserialized when job descriptors are sent by jobtracker to tasktrackers on mapred nodes, and used locally by tasktrackers to instantiate local tasks using copies of the original NutchConf instance.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to