[ http://issues.apache.org/jira/browse/NUTCH-60?page=all ]
Jerome Charron updated NUTCH-60:
--------------------------------
Attachment: NUTCH-60-050605.patch
This patch, keeps the improvements of the previous one (configuration), and
provides some optimizations that reduce the processing time from 70% to 20%,
depending on the configuration (size of data to process), with an average gain
of 25%.
I will provides more detailled results of my benchs on the Wiki as soon as
possible (http://wiki.apache.org/nutch/LanguageIdentifierBenchs).
> Bad language identifier plugin performances
> -------------------------------------------
>
> Key: NUTCH-60
> URL: http://issues.apache.org/jira/browse/NUTCH-60
> Project: Nutch
> Type: Improvement
> Components: indexer
> Reporter: Jerome Charron
> Priority: Minor
> Attachments: NUTCH-60-050526.patch, NUTCH-60-050605.patch
>
> As reported by Stefan Groschupf
> (http://www.mail-archive.com/[email protected]/msg04090.html)
> the language identifier plugin consumes a lot of processing time.
> Some optimizations and/or configuration options are required.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira