maartenbreddels commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645546979


   I've added my own utf encode/decode for now. With lookup tables I now get:
   ```
   Utf8Lower_median    18414820 ns     18408392 ns            3 
bytes_per_second=1078.04M/s items_per_second=56.9618M/s
   Utf8Upper_median    17004210 ns     17003407 ns            3 
bytes_per_second=1.13976G/s items_per_second=61.6686M/s
   ```
   
   which is faster than the 'ascii' version implemented previously (that got 
`items_per_second=53 M/s`). 
   
   Benchmark results vary a lot between `items_per_second=55-66M/s` .
   
   Using utf8proc's encode/decode (inlined), this goes down to `18M/s`. I have 
to look a bit into why that is the case since they do a bit more sanity 
checking. Ideally, some of this goes upstream.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to