[
https://issues.apache.org/jira/browse/CODEC-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817877#comment-13817877
]
Sebb commented on CODEC-174:
----------------------------
One of the changes is the following:
Return type of method 'public org.apache.commons.codec.language.bm.Rule$Phoneme
append(java.lang.CharSequence)' has been changed to void
This can be fixed by changing the method the the following:
{code}
public Phoneme append(final CharSequence str) {
this.phonemeText.append(str);
return this;
}
{code}
Note that Phoneme used to be thread-safe - it no longer is.
> Improve performance of Beider Morse encoder
> -------------------------------------------
>
> Key: CODEC-174
> URL: https://issues.apache.org/jira/browse/CODEC-174
> Project: Commons Codec
> Issue Type: Improvement
> Affects Versions: 1.6, 1.7
> Reporter: Thomas Champagne
> Labels: patch, performance
> Attachments: CODEC-174-change-rules-storage-to-Map.patch,
> CODEC-174-delete-subsequence-cache-and-use-String.patch,
> CODEC-174-delete-subsequence-cache.patch,
> CODEC-174-reuse-set-in-PhonemeBuilder.patch, CODEC_174_cleanup.patch,
> TestCacheSubSequence.java, test-commons-codec-test-bm.zip
>
>
> I use Beider Morse encoder with Solr. When it indexes a lot of documents
> using this encoder, the import time is multiplied by 30. So, I have decided
> to optimize the current implementation in the commons-codec.
> Currently, I have created two patch. The first patch delete a "performance
> hack" about a subsequence cache. This cache doesn't optimize performance and
> after deleting it, you can win some milliseconds.
> The second patch changes the storage of the rules in memory using a Map
> instead of List. With it, you can access to a rule directly with the
> beginning of pattern. This patch divide the encoding time by 2.
> I will try to find more improvement. If you have any idea, please tell me it.
--
This message was sent by Atlassian JIRA
(v6.1#6144)