[ 
https://issues.apache.org/jira/browse/TEXT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Hadadi closed TEXT-153.
----------------------------

> LookupTranslator performance optimization
> -----------------------------------------
>
>                 Key: TEXT-153
>                 URL: https://issues.apache.org/jira/browse/TEXT-153
>             Project: Commons Text
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Amir Hadadi
>            Priority: Minor
>             Fix For: 1.7
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When using Java Mission Control to profile an application that uses 
> StringEscapeUtils::escapeEcmaScript, I noticed that a lot of time is spent in 
> LookupTranslator::translate at the prefixSet::contains check.
> I suggest taking advantage of the fact that prefixSet contains only 
> characters, and replace it with a BitSet.
> I did some benchmarking and translate for the non escaped case is 4-5X faster 
> when replacing the HashSet with a BitSet.
> BitSet Memory consumption for characters is capped at 8KB, and depends on the 
> maximal prefix character. For example for ECMA script the max escaped prefix 
> character is "\" which has unicode code 92 so the BitSet uses a long array of 
> length 2 to represent all the needed characters.
>  
> Link to pull request: https://github.com/apache/commons-text/pull/108
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to