[GitHub] [arrow] n3world commented on pull request #10662: ARROW-13252: [C++] Add offset to CSV error reporting

GitBox Mon, 13 Sep 2021 19:15:57 -0700


n3world commented on pull request #10662:
URL: https://github.com/apache/arrow/pull/10662#issuecomment-918737367



   > Sorry for the wait for feedback. Have you run the parsing benchmarks in 
`parser_benchmark.cc`? Does capturing the offset have any noticeable effect on 
performance?
   
   I was getting some wide variations between runs but these are the best 
numbers I got for master and this branch
   master:
   
   > 
----------------------------------------------------------------------------------
   Benchmark                        Time             CPU   Iterations 
UserCounters...
   
----------------------------------------------------------------------------------
   ChunkCSVQuotedBlock         168985 ns       168981 ns         4117 
bytes_per_second=959.423M/s
   ChunkCSVEscapedBlock        155557 ns       155555 ns         4460 
bytes_per_second=980.924M/s
   ChunkCSVNoNewlinesBlock        147 ns          147 ns      4741333 
bytes_per_second=0/s
   ParseCSVQuotedBlock         263473 ns       263469 ns         2646 
bytes_per_second=615.346M/s
   ParseCSVEscapedBlock        209135 ns       209132 ns         3362 
bytes_per_second=729.624M/s
   ParseCSVFlightsExample     2175256 ns      2175243 ns          320 
bytes_per_second=446.642M/s
   ParseCSVVehiclesExample   15967256 ns     15967111 ns           44 
bytes_per_second=718.222M/s
   ParseCSVStocksExample      3463566 ns      3463298 ns          203 
bytes_per_second=605.926M/s
   
   This branch:
   > 
----------------------------------------------------------------------------------
   Benchmark                        Time             CPU   Iterations 
UserCounters...
   
----------------------------------------------------------------------------------
   ChunkCSVQuotedBlock         169009 ns       169006 ns         4093 
bytes_per_second=959.283M/s
   ChunkCSVEscapedBlock        156445 ns       156443 ns         4467 
bytes_per_second=975.356M/s
   ChunkCSVNoNewlinesBlock        149 ns          149 ns      4749759 
bytes_per_second=0/s
   ParseCSVQuotedBlock         369561 ns       369551 ns         1882 
bytes_per_second=438.707M/s
   ParseCSVEscapedBlock        367681 ns       367671 ns         1867 
bytes_per_second=415.012M/s
   ParseCSVFlightsExample     2538161 ns      2538102 ns          278 
bytes_per_second=382.788M/s
   ParseCSVVehiclesExample   16641194 ns     16639585 ns           42 
bytes_per_second=689.196M/s
   ParseCSVStocksExample      3119450 ns      3119364 ns          224 
bytes_per_second=672.734M/s
   
   No significant difference that I can see


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] n3world commented on pull request #10662: ARROW-13252: [C++] Add offset to CSV error reporting

Reply via email to