n3world edited a comment on pull request #10662: URL: https://github.com/apache/arrow/pull/10662#issuecomment-918737367
> Sorry for the wait for feedback. Have you run the parsing benchmarks in `parser_benchmark.cc`? Does capturing the offset have any noticeable effect on performance? I was getting some wide variations between runs but these are the best numbers I got for master and this branch master: ` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------- ChunkCSVQuotedBlock 168985 ns 168981 ns 4117 bytes_per_second=959.423M/s ChunkCSVEscapedBlock 155557 ns 155555 ns 4460 bytes_per_second=980.924M/s ChunkCSVNoNewlinesBlock 147 ns 147 ns 4741333 bytes_per_second=0/s ParseCSVQuotedBlock 263473 ns 263469 ns 2646 bytes_per_second=615.346M/s ParseCSVEscapedBlock 209135 ns 209132 ns 3362 bytes_per_second=729.624M/s ParseCSVFlightsExample 2175256 ns 2175243 ns 320 bytes_per_second=446.642M/s ParseCSVVehiclesExample 15967256 ns 15967111 ns 44 bytes_per_second=718.222M/s ParseCSVStocksExample 3463566 ns 3463298 ns 203 bytes_per_second=605.926M/s` This branch: ` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------- ChunkCSVQuotedBlock 169009 ns 169006 ns 4093 bytes_per_second=959.283M/s ChunkCSVEscapedBlock 156445 ns 156443 ns 4467 bytes_per_second=975.356M/s ChunkCSVNoNewlinesBlock 149 ns 149 ns 4749759 bytes_per_second=0/s ParseCSVQuotedBlock 369561 ns 369551 ns 1882 bytes_per_second=438.707M/s ParseCSVEscapedBlock 367681 ns 367671 ns 1867 bytes_per_second=415.012M/s ParseCSVFlightsExample 2538161 ns 2538102 ns 278 bytes_per_second=382.788M/s ParseCSVVehiclesExample 16641194 ns 16639585 ns 42 bytes_per_second=689.196M/s ParseCSVStocksExample 3119450 ns 3119364 ns 224 bytes_per_second=672.734M/s` No significant difference that I can see -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
