[ 
https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818330#comment-13818330
 ] 

Gary Gregory commented on CODEC-174:
------------------------------------

For the other Clirr reports, does this breakage warrant a 2.0 label or can we 
file the changes under the fact that these classes are internal to the codec? 
Should these classes be Javadoc'd as package private? The changes seem inherent 
in the implementation of the performance improvements.

> Improve performance of Beider Morse encoder
> -------------------------------------------
>
>                 Key: CODEC-174
>                 URL: https://issues.apache.org/jira/browse/CODEC-174
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, 
> CODEC-174-delete-subsequence-cache-and-use-String.patch, 
> CODEC-174-delete-subsequence-cache.patch, 
> CODEC-174-reuse-set-in-PhonemeBuilder.patch, CODEC_174_cleanup.patch, 
> TestCacheSubSequence.java, test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents 
> using this encoder, the import time is multiplied by 30. So, I have decided 
> to optimize the current implementation in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance 
> hack" about a subsequence cache. This cache doesn't optimize performance and 
> after deleting it, you can win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map 
> instead of List. With it, you can access to a rule directly with the 
> beginning of pattern. This patch divide the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to