Thomas Champagne created CODEC-174:
--------------------------------------
Summary: Improve performance of Beider Morse encoder
Key: CODEC-174
URL: https://issues.apache.org/jira/browse/CODEC-174
Project: Commons Codec
Issue Type: Improvement
Affects Versions: 1.7, 1.6
Reporter: Thomas Champagne
I use Beider Morse encoder with Solr. When it indexes a lot of documents using
this encoder, the import time is multiplied by 30. So, I have decided to
optimize the current implementation in the commons-codec.
Currently, I have created two patch. The first patch delete a "performance
hack" about a subsequence cache. This cache doesn't optimize performance and
after deleting it, you can win some milliseconds.
The second patch changes the storage of the rules in memory using a Map instead
of List. With it, you can access to a rule directly with the beginning of
pattern. This patch divide the encoding time by 2.
I will try to find more improvement. If you have any idea, please tell me it.
--
This message was sent by Atlassian JIRA
(v6.1#6144)