dhegberg commented on PR #13228:
URL: https://github.com/apache/datafusion/pull/13228#issuecomment-2547418265
Updated to move Null parsing regex to a config.
Benchmark comparison when using regex does show some regression:
Before:
```
Running benches/csv_load.rs
(/Users/dhegberg/workplace/datafusion/target/release/deps/csv_load-0ca64cec5e99a8c3)
Gnuplot not found, using plotters backend
Generated test dataset with 69642 rows
Benchmarking load csv testing/default csv read options
Benchmarking load csv testing/default csv read options: Warming up for
3.0000 s
Benchmarking load csv testing/default csv read options: Collecting 100
samples in estimated 20.457 s (1200 iterations)
Benchmarking load csv testing/default csv read options: Analyzing
load csv testing/default csv read options
time: [20.305 ms 20.536 ms 20.763 ms]
mean [20.305 ms 20.763 ms] std. dev. [1.0513 ms 1.2800 ms]
median [20.127 ms 21.042 ms] med. abs. dev. [1.0398 ms 1.6551 ms]
After:
```
Gnuplot not found, using plotters backend
Generated test dataset with 69642 rows
Benchmarking load csv testing/default csv read options
Benchmarking load csv testing/default csv read options: Warming up for
3.0000 s
Benchmarking load csv testing/default csv read options: Collecting 100
samples in estimated 21.606 s (1200 iterations)
Benchmarking load csv testing/default csv read options: Analyzing
load csv testing/default csv read options
time: [21.583 ms 21.856 ms 22.166 ms]
change: [+1.9609% +3.6130% +5.3682%] (p = 0.00 <
0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
mean [21.583 ms 22.166 ms] std. dev. [988.76 µs 2.0538 ms]
median [21.438 ms 21.965 ms] med. abs. dev. [776.50 µs 1.3261 ms]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]