[GitHub] [arrow] mapleFU commented on pull request #34323: GH-34322: [C++][Parquet] Encoding Microbench for ByteArray

via GitHub Tue, 28 Feb 2023 01:52:33 -0800


mapleFU commented on PR #34323:
URL: https://github.com/apache/arrow/pull/34323#issuecomment-1447880461


   I guess:
   1. for plain encoder: When string is short and batch is small, memory 
bandwidth cannot be full-used. When memory bindwidth can be make full used off, 
the speed is about 3GB/s. When the string grows larger, the bottleneck may goes 
to "resize"
   2. for plain decoder: The speed is nearly unrelated to size of bytes, it's 
scalable.
   3. For DeltaBitLength Encoder: when string are short, this is slower. But 
when string is long, seems that it's ok. Maybe because some optimizations.
   4. For DeltaBitLength Decoder: so slow, I'll fix it in another patch using 
zero-copy.
   5. For Dict decoder: Because it need to decode the dict page, seems dict and 
memcpy is still used. Maybe I can optimize it later
   
   @wgtmac @rok 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mapleFU commented on pull request #34323: GH-34322: [C++][Parquet] Encoding Microbench for ByteArray

Reply via email to