Chris A. Mattmann created TIKA-1549:
---------------------------------------

             Summary: Two times speed increase of language profile distance 
calculation
                 Key: TIKA-1549
                 URL: https://issues.apache.org/jira/browse/TIKA-1549
             Project: Tika
          Issue Type: Bug
          Components: languageidentifier
            Reporter: Toke Eskildsen
            Assignee: Chris A. Mattmann
             Fix For: 1.8


The distance calculation for language profiles creates a Set of Strings with 
each call and performs hashtable lookups for all those Strings. This patch 
creates and caches compact structures that are iterated sequentially. The 
result is a factor 2 speed up (see the performance test in 
LanguageIdentifierTest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to