[ 
https://issues.apache.org/jira/browse/CODEC-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224610#comment-13224610
 ] 

Matthew Pocock commented on CODEC-132:
--------------------------------------

Hi,

Limiting the size of the set of intermediate phonemes considered is probably a 
good thing for this kind of random-string testing, and may well have no 
discernible negative impact in normal use. The rules are not really intended to 
apply to random strings, and words from languages (and in particular, names) 
are very much not random.

I've not run a corpus of real names through this code to estimate the normal 
range of this phoneme set size. If we start seeing incomplete or strange 
results after this change, perhaps it would be worth doing.

Matthew


                
> BeiderMorseEncoder OOM issues
> -----------------------------
>
>                 Key: CODEC-132
>                 URL: https://issues.apache.org/jira/browse/CODEC-132
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: Robert Muir
>         Attachments: CODEC-132.patch, CODEC-132_test.patch
>
>
> In Lucene/Solr, we integrated this encoder into the latest release.
> Our tests use a variety of random strings, and we have recent jenkins failures
> from some input streams (of length <= 10), using huge amounts of memory (e.g. 
> > 64MB),
> resulting in OOM.
> I've created a test case (length is 30 here) that will OOM with -Xmx256M. 
> I haven't dug into this much as to what's causing it, but I suspect there 
> might be a bug
> revolving around certain punctuation characters: we didn't see this happening 
> until
> we beefed up our random string generation to start producing "html-like" 
> strings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to