mapleFU commented on PR #37641:
URL: https://github.com/apache/arrow/pull/37641#issuecomment-1721724928
Comparing with Plain Decoding and Delta length, Delta would be slower,
because Plain/Delta length would not need to copy any data. Their speed is
stable with different input.
```
BM_PlainDecodingByteArray/max-string-length:8/batch-size:512
1279 ns 1279 ns
548306 bytes_per_second=2.97226G/s items_per_second=400.298M/s
BM_PlainDecodingByteArray/max-string-length:64/batch-size:512
1279 ns 1278 ns
543322 bytes_per_second=13.5389G/s items_per_second=400.49M/s
BM_PlainDecodingByteArray/max-string-length:1024/batch-size:512
1278 ns 1278 ns
547482 bytes_per_second=198.153G/s items_per_second=400.56M/s
```
```
BM_DeltaLengthDecodingByteArray/max-string-length:8/batch-size:512
1566 ns 1566 ns
449207 bytes_per_second=2.42814G/s items_per_second=327.017M/s
BM_DeltaLengthDecodingByteArray/max-string-length:64/batch-size:512
1576 ns 1576 ns
450990 bytes_per_second=10.9846G/s items_per_second=324.93M/s
BM_DeltaLengthDecodingByteArray/max-string-length:1024/batch-size:512
1554 ns 1554 ns
447002 bytes_per_second=162.973G/s items_per_second=329.444M/s
```
Comparing with Dict, Dict require to set dictionary, and parsing dictionary
would get a memcpy. So in benchmark, they're similiar. However, during real
using Dict, a ColumnChunk will only set dictionary once, so Dictionary is still
faster than DELTA.
Currently DELTA doesn't has speed advantage. However, if data is
prefixed-data, like `"abcd, abcf, ..."`, DELTA can balance space and encoding.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]