mapleFU commented on PR #37641:
URL: https://github.com/apache/arrow/pull/37641#issuecomment-1721724928

   Comparing with Plain Decoding and Delta length, Delta would be slower, 
because Plain/Delta length would not need to copy any data. Their speed is 
stable with different input.
   
   ```
   BM_PlainDecodingByteArray/max-string-length:8/batch-size:512                 
                                                1279 ns         1279 ns       
548306 bytes_per_second=2.97226G/s items_per_second=400.298M/s
   BM_PlainDecodingByteArray/max-string-length:64/batch-size:512                
                                                1279 ns         1278 ns       
543322 bytes_per_second=13.5389G/s items_per_second=400.49M/s
   BM_PlainDecodingByteArray/max-string-length:1024/batch-size:512              
                                                1278 ns         1278 ns       
547482 bytes_per_second=198.153G/s items_per_second=400.56M/s
   ```
   
   ```
   BM_DeltaLengthDecodingByteArray/max-string-length:8/batch-size:512           
                                                1566 ns         1566 ns       
449207 bytes_per_second=2.42814G/s items_per_second=327.017M/s
   BM_DeltaLengthDecodingByteArray/max-string-length:64/batch-size:512          
                                                1576 ns         1576 ns       
450990 bytes_per_second=10.9846G/s items_per_second=324.93M/s
   BM_DeltaLengthDecodingByteArray/max-string-length:1024/batch-size:512        
                                                1554 ns         1554 ns       
447002 bytes_per_second=162.973G/s items_per_second=329.444M/s
   ```
   
   Comparing with Dict, Dict require to set dictionary, and parsing dictionary 
would get a memcpy. So in benchmark, they're similiar. However, during real 
using Dict, a ColumnChunk will only set dictionary once, so Dictionary is still 
faster than DELTA.
   
   Currently DELTA doesn't has speed advantage. However, if data is 
prefixed-data, like `"abcd, abcf, ..."`, DELTA can balance space and encoding.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to