n3world edited a comment on pull request #10662: URL: https://github.com/apache/arrow/pull/10662#issuecomment-942799576
I think I figured out the issue. It was the overhead of checking the buffer bounds on every append. I changed it back to not checking bounds for the fixed size buffer and numbers look much better ``` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------- ChunkCSVQuotedBlock 174105 ns 174102 ns 4012 bytes_per_second=931.204M/s ChunkCSVEscapedBlock 143420 ns 143417 ns 4863 bytes_per_second=1063.95M/s ChunkCSVNoNewlinesBlock 179 ns 179 ns 4276522 bytes_per_second=0/s ParseCSVQuotedBlock 286099 ns 286084 ns 2434 bytes_per_second=566.703M/s ParseCSVEscapedBlock 267409 ns 267403 ns 2612 bytes_per_second=570.629M/s ParseCSVFlightsExample 2238658 ns 2238612 ns 310 bytes_per_second=433.999M/s ParseCSVVehiclesExample 16297590 ns 16297154 ns 43 bytes_per_second=703.677M/s ParseCSVStocksExample 3236563 ns 3236484 ns 216 bytes_per_second=648.39M/s ``` Those numbers are about what I am getting now for master too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
