On 5/30/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Doğacan Güney wrote:
> My patch is just a draft to see if we can create a better caching
> mechanism. There are definitely some rough edges there:)
One important information: in future versions of Hadoop the method
Configuration.setObject() is deprecated and then will be removed, so we
have to grow our own caching mechanism anyway - either use a singleton
cache, or change nearly all API-s to pass around a user/job/task context.
So, we will face this problem pretty soon, with the next upgrade of Hadoop.
Hmm, well, that sucks, but this is not really a problem for
PluginRepository: PluginRepository already has its own cache
mechanism.
> You are right about per-plugin parameters but I think it will be very
> difficult to keep PluginProperty class in sync with plugin parameters.
> I mean, if a plugin defines a new parameter, we have to remember to
> update PluginProperty. Perhaps, we can force plugins to define
> configuration options it will use in, say, its plugin.xml file, but
> that will be very error-prone too. I don't want to compare entire
> configuration objects, because changing irrevelant options, like
> fetcher.store.content shouldn't force loading plugins again, though it
> seems it may be inevitable....
Let me see if I understand this ... In my opinion this is a non-issue.
Child tasks are started in separate JVMs, so the only "context"
information that they have is what they can read from job.xml (which is
a superset of all properties from config files + job-specific data +
task-specific data). This context is currently instantiated as a
Configuration object, and we (ab)use it also as a local per-JVM cache
for plugin instances and other objects.
Once we instantiate the plugins, they exist unchanged throughout the
lifecycle of JVM (== lifecycle of a single task), so we don't have to
worry about having different sets of plugins with different parameters
for different jobs (or even tasks).
In other words, it seems to me that there is no such situation in which
we have to reload plugins within the same JVM, but with different
parameters.
Problem is that someone might get a little too smart. Like one may
write a new job where he has two IndexingFilters but creates each from
completely different configuration objects. Then filters some
documents with the first filter and others with the second. I agree
that this is a bit of a reach, but it is possible.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
--
Doğacan Güney