seanfabs opened a new pull request, #239:
URL: https://github.com/apache/commons-email/pull/239

   There are several fixes and improvements to the regex's here:
   1. The regex could run with exponential time in the `\\s*[^>]*?\\s+` part, 
this is because the `\s`, `[^>]`, and `\\s+` parts can all match whitespace and 
so when excessive whitespace is encountered it will attempt all combinations to 
find a match. This is a fairly well documented phenomenon 
https://blog.codinghorror.com/regex-performance/
   2. The `<script>` tag matching did not match multiple scripts.
   3. Fix the edge case where tags could match which started with img or script 
e.g. `<imgx ...>`
   4. Use a greedy matcher for the src url `[^\"']+` which gives a slight 
performance boost - we will always want to grab the whole url.
   
   # Benchmark results
   
   VM version: JDK 17.0.10, OpenJDK 64-Bit Server VM, 17.0.10+8-b1207.12
     
   Result for old regex 
`(<[Ii][Mm][Gg]\s*[^>]*?\s+[Ss][Rr][Cc]\s*=\s*["'])([^"']+?)(["'])`
   
     32.339 ±(99.9%) 4.394 ops/s [Average]
     (min, avg, max) = (20.590, 32.339, 40.601), stdev = 5.866
     CI (99.9%): [27.945, 36.733] (assumes normal distribution)
   
   Result for new regex - candidate non greedy url 
`(<[Ii][Mm][Gg](?=\s)[^>]*?\s[Ss][Rr][Cc]\s*=\s*["'])([^"']+?)(["'])`
   
     1244.602 ±(99.9%) 225.819 ops/s [Average]
     (min, avg, max) = (518.410, 1244.602, 1512.089), stdev = 301.462
     CI (99.9%): [1018.783, 1470.420] (assumes normal distribution)
   
   Result for new regex 
`(<[Ii][Mm][Gg](?=\s)[^>]*?\s[Ss][Rr][Cc]\s*=\s*["'])([^"']+)(["'])`
   
     1480.002 ±(99.9%) 160.408 ops/s [Average]
     (min, avg, max) = (986.366, 1480.002, 1651.095), stdev = 214.140
     CI (99.9%): [1319.594, 1640.409] (assumes normal distribution)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to