[ https://issues.apache.org/jira/browse/NUTCH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved NUTCH-314. ---------------------------------------- Resolution: Won't Fix close of legacy issue > Multiple language identifier instances > -------------------------------------- > > Key: NUTCH-314 > URL: https://issues.apache.org/jira/browse/NUTCH-314 > Project: Nutch > Issue Type: Bug > Affects Versions: 0.8 > Environment: OS: Linux RHEL 4 > JDK: 1.5_07 > Reporter: Enrico Triolo > > In my application I often need to perform the inject -> generate -> .. -> > index loop multiple times, since users can 'suggest' new web pages to be > crawled and indexed. > I also need to enable the language identifier plugin. > Everything seems to work correctly, but after some time I get an > OutOfMemoryException. Actually the time isn't important, since I noticed that > the problem arises when the user submits many urls (~100). As I said, for > each submitted url a new loop is performed (similar to the one in the > Crawl.main method). > Using a profiler (specifically, netbeans profiler) I found out that for each > submitted url a new LanguageIdentifier instance is created, and never > released. With the memory inspector tool I can see as many instances of > LanguageIdentifier and NGramProfile$NGramEntry as the number of fetched > pages, each of them occupying about 180kb. Forcing garbage collection doesn't > release much memory. > Maybe we should cache its instance in the conf as we do for many others > objects in Nutch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira