[jira] [Updated] (CODEC-174) Improve performance of Beider Morse encoder

Thomas Champagne (JIRA) Tue, 12 Nov 2013 01:15:43 -0800

     [ 
https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Champagne updated CODEC-174:
-----------------------------------

    Attachment: CODEC-174-refactor-join-method-in-Phoneme.patch

An other patch that refactores the join() method in Rule.Phoneme class. The 
apply() method in PhonemeBuilder is testing if restricted languages are not 
empty before joining phonemes. The patch doesn't remove the join() method even 
if it is not used.

The results before patch : 
{quote}
Time for encoding 80 000 times the input 'Angelo': 12 533 millis.
Time for encoding 80 000 times the input 'Angelo': 12 603 millis.
Time for encoding 80 000 times the input 'Angelo': 12 818 millis.
Time for encoding 80 000 times the input 'Angelo': 12 642 millis.
{quote}

The results after patch : 
{quote}
Time for encoding 80 000 times the input 'Angelo': 11 682 millis.
Time for encoding 80 000 times the input 'Angelo': 11 758 millis.
Time for encoding 80 000 times the input 'Angelo': 11 825 millis.
Time for encoding 80 000 times the input 'Angelo': 11 733 millis.
{quote}

> Improve performance of Beider Morse encoder
> -------------------------------------------
>
>                 Key: CODEC-174
>                 URL: https://issues.apache.org/jira/browse/CODEC-174
>             Project: Commons Codec
>          Issue Type: Improvement
>    Affects Versions: 1.6, 1.7
>            Reporter: Thomas Champagne
>              Labels: patch, performance
>         Attachments: CODEC-174-change-rules-storage-to-Map.patch, 
> CODEC-174-delete-subsequence-cache-and-use-String.patch, 
> CODEC-174-delete-subsequence-cache.patch, 
> CODEC-174-refactor-join-method-in-Phoneme.patch, 
> CODEC-174-reuse-set-in-PhonemeBuilder.patch, CODEC_174_cleanup.patch, 
> TestCacheSubSequence.java, test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents 
> using this encoder, the import time is multiplied by 30. So, I have decided 
> to optimize the current implementation in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance 
> hack" about a subsequence cache. This cache doesn't optimize performance and 
> after deleting it, you can win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map 
> instead of List. With it, you can access to a rule directly with the 
> beginning of pattern. This patch divide the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (CODEC-174) Improve performance of Beider Morse encoder

Reply via email to