mapleFU commented on PR #34323: URL: https://github.com/apache/arrow/pull/34323#issuecomment-1447880461
I guess: 1. for plain encoder: When string is short and batch is small, memory bandwidth cannot be full-used. When memory bindwidth can be make full used off, the speed is about 3GB/s. When the string grows larger, the bottleneck may goes to "resize" 2. for plain decoder: The speed is nearly unrelated to size of bytes, it's scalable. 3. For DeltaBitLength Encoder: when string are short, this is slower. But when string is long, seems that it's ok. Maybe because some optimizations. 4. For DeltaBitLength Decoder: so slow, I'll fix it in another patch using zero-copy. 5. For Dict decoder: Because it need to decode the dict page, seems dict and memcpy is still used. Maybe I can optimize it later @wgtmac @rok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
