[
https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Padiasek updated NUTCH-1746:
---------------------------------
Attachment: ObjectCache.patch
Indeed, after investigating more I found that the problem is in ObjectCache or
strictly speaking in how it is being used. It turns out that ObjectCache.get()
is called with multiple copies of Configuration which results in creating
multiple copies of filters.
I was able to avoid OOM exception in all mappers by changing ObjectCache to use
Configuration.toString() as CACHE key instead of Configuration. Changing CACHE
into an instance of ObjectCache (that is common for all Configuration) also
works, but in this case weak references are eliminated and the CACHE is never
cleared. For that reason the first approach might be better.
More investigation might reveal why multiple Configuration are being passed to
ObjectCache, but for the time being I am using a modified ObjectCache (patch
attached).
> OutOfMemoryError in Mappers
> ---------------------------
>
> Key: NUTCH-1746
> URL: https://issues.apache.org/jira/browse/NUTCH-1746
> Project: Nutch
> Issue Type: Bug
> Components: generator, injector
> Affects Versions: 1.7
> Environment: Nutch running in local mode with 4M+ domains in
> domain-urlfilter.txt
> Reporter: Greg Padiasek
> Attachments: Generator.patch, Injector.patch, ObjectCache.patch,
> domain-urlfilter-aa, domain-urlfilter-ab, domain-urlfilter-ac
>
>
> Initially I found that Generator was throwing OutOfMemoryError exception no
> matter how much RAM I allocated to JVM. I fixed the problem by moving
> URLFilters, URLNormalizers and ScoringFilters to top-level class as
> singletons and re-using them in all Generator mapper instances.
> Then I found the same problem in Injector and applied analogical fix.
> Now it seems that this issue may be common in all Nutch Mapper
> implementations.
> I was wondering if it would it be possible to integrate this kind of change
> in the upstream code base and potentially update all vulnerable Mapper
> classes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)