Enrico Triolo wrote:
Using a profiler (specifically, netbeans profiler) I found out that
for each submitted url a new LanguageIdentifier instance is created,
and never released. With the memory inspector tool I can see as many
instances of LanguageIdentifier and NGramProfile$NGramEntry as the
number of fetched pages, each of them occupying about 180kb. Forcing
garbage collection doesn't release much memory.

Yes, this looks like a bug. A single instance of LanguageIdentifier per task should be cached in the job "context" (i.e. Configuration instance), to avoid too many instantiations.


Since I was still having some strange results with the profiler, I
added a println message in the getInstance method, to monitor
effectively singleton creation. It turns out that the singleton is
re-istantiated each time!
I can't really understand why this is happening, maybe is something
related to hadoop internals?

I remember a similar situation I had, where instance variables were not initialized after the object was created with Class.newInstance(). VM bug? not sure... I didn't track it down that time, I simply moved the variable initialization to setConf(), which solved my problem.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to