zhuliquan commented on PR #12270:
URL: https://github.com/apache/datafusion/pull/12270#issuecomment-2543947569

   > I wonder if we can see improvements on queries in benchmarks with scalar 
regexes, e.g. clickbench?
   
   hello, @Dandandan I have write a benchmark for testing scalar regex match in 
PR #13789. I got below diff (before: without pre-compiled pattern, after: with 
pre-compiled pattern)
   ```text
   test email address pattern
                           time:   [14.047 ms 14.155 ms 14.264 ms]
                           change: [+37.888% +41.826% +45.676%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   test ip pattern         time:   [3.5913 ms 3.6025 ms 3.6139 ms]
                           change: [-45.614% -45.332% -45.065%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   test phone number pattern
                           time:   [12.893 ms 13.067 ms 13.303 ms]
                           change: [-51.353% -50.433% -49.409%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
   
   test html tag pattern   time:   [13.158 ms 13.491 ms 13.865 ms]
                           change: [+26.127% +29.636% +33.599%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     4 (4.00%) high mild
     8 (8.00%) high severe
   
   test url pattern        time:   [12.467 ms 12.594 ms 12.726 ms]
                           change: [+38.072% +42.490% +45.651%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
   test date pattern       time:   [12.429 ms 12.523 ms 12.629 ms]
                           change: [-37.932% -37.049% -36.163%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   I'am very confused that some cases have improved and others have regressed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to