[ 
https://issues.apache.org/jira/browse/LANG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361697#comment-14361697
 ] 

Fabian Lange commented on LANG-935:
-----------------------------------

for example
{code}
    private static final String ESCAPE_HTML4 = StringEscapeUtils.escapeHtml4("a 
string with some amounts of html special chars äüö'&>< ");

        @Benchmark
    public String testMethod() {
       return StringEscapeUtils.unescapeHtml4(ESCAPE_HTML4);
    }
{code}

which performs a backwards translation of html, which all starts with &, which 
matches your edge case, still results in this:

{code}
Result: 53780.401 ±(99.9%) 1764.483 ops/s [Average]
  Statistics: (min, avg, max) = (53417.388, 53780.401, 54581.485), stdev = 
458.231
  Confidence interval (99.9%): [52015.918, 55544.885]
{code}

{code}
Result: 220500.387 ±(99.9%) 22247.764 ops/s [Average]
  Statistics: (min, avg, max) = (213460.824, 220500.387, 228476.282), stdev = 
5777.674
  Confidence interval (99.9%): [198252.624, 242748.151]
{code}

thats a 4 times improvement.

Only a very limited amount of edgy 1 char replacement cases are not showing 
clear winners (some show indeed the new code be slightly slower).


So I guess its time to make a decision right?

Imagine it would be the other way around. My code would be master and the 
current code would be patch. Would you be willing to massively slow down most 
of the real world use cases for the translators for some limited char edge 
cases?

> Possible performance improvement on string escape functions
> -----------------------------------------------------------
>
>                 Key: LANG-935
>                 URL: https://issues.apache.org/jira/browse/LANG-935
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.text.translate.*
>    Affects Versions: 3.1
>            Reporter: Peter Wall
>            Priority: Minor
>              Labels: performance
>             Fix For: Patch Needed
>
>         Attachments: tempproject1.zip
>
>
> The escape functions for HTML etc. use the same code and the same 
> initialisation tables for the escape and unescape functions, and while this 
> is an elegant approach it leads to a number of deficiencies:
> 1. The code is very much less efficient than it could be
> 2. A new output string is created even when no conversion is required
> 3. No mapping is provided for characters that do not have a specific 
> representation (for example HTML 0x101 should become &amp;#257; )
> The proposal is to use a new mapping technique to address these issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to