[jira] [Updated] (CODEC-174) Improve performance of Beider Morse encoder

Thomas Champagne (JIRA) Fri, 08 Nov 2013 07:49:41 -0800

     [ 
https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Champagne updated CODEC-174:
-----------------------------------

    Attachment: CODEC-174-delete-subsequence-cache-and-use-String.patch

Update the delete-subsequence-cache patch.
The results before patch : 
{quote}
Time for encoding 80 000 times the input 'Angelo': 12 666 millis.
Time for encoding 80 000 times the input 'Angelo': 12 825 millis.
Time for encoding 80 000 times the input 'Angelo': 12 776 millis.
Time for encoding 80 000 times the input 'Angelo': 12 874 millis.
{quote}

The results after patch : 
{quote}
Time for encoding 80 000 times the input 'Angelo': 11 903 millis.
Time for encoding 80 000 times the input 'Angelo': 11 889 millis.
Time for encoding 80 000 times the input 'Angelo': 11 700 millis.
Time for encoding 80 000 times the input 'Angelo': 11 821 millis.
{quote}

> Improve performance of Beider Morse encoder
> -------------------------------------------
>
>                 Key: CODEC-174
>                 URL: https://issues.apache.org/jira/browse/CODEC-174
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, 
> CODEC-174-delete-subsequence-cache-and-use-String.patch, 
> CODEC-174-delete-subsequence-cache.patch, 
> CODEC-174-reuse-set-in-PhonemeBuilder.patch, TestCacheSubSequence.java, 
> test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents 
> using this encoder, the import time is multiplied by 30. So, I have decided 
> to optimize the current implementation in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance 
> hack" about a subsequence cache. This cache doesn't optimize performance and 
> after deleting it, you can win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map 
> instead of List. With it, you can access to a rule directly with the 
> beginning of pattern. This patch divide the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (CODEC-174) Improve performance of Beider Morse encoder

Reply via email to