[
https://issues.apache.org/jira/browse/TEXT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amir Hadadi closed TEXT-153.
----------------------------
> LookupTranslator performance optimization
> -----------------------------------------
>
> Key: TEXT-153
> URL: https://issues.apache.org/jira/browse/TEXT-153
> Project: Commons Text
> Issue Type: Improvement
> Affects Versions: 1.0
> Reporter: Amir Hadadi
> Priority: Minor
> Fix For: 1.7
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> When using Java Mission Control to profile an application that uses
> StringEscapeUtils::escapeEcmaScript, I noticed that a lot of time is spent in
> LookupTranslator::translate at the prefixSet::contains check.
> I suggest taking advantage of the fact that prefixSet contains only
> characters, and replace it with a BitSet.
> I did some benchmarking and translate for the non escaped case is 4-5X faster
> when replacing the HashSet with a BitSet.
> BitSet Memory consumption for characters is capped at 8KB, and depends on the
> maximal prefix character. For example for ECMA script the max escaped prefix
> character is "\" which has unicode code 92 so the BitSet uses a long array of
> length 2 to represent all the needed characters.
>
> Link to pull request: https://github.com/apache/commons-text/pull/108
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)