pitrou commented on issue #44084:
URL: https://github.com/apache/arrow/issues/44084#issuecomment-2368491413

   I made some initial experiments on this and I came to the following 
conclusion:
   1. The performance is a mixed bag, with some non-negligible speedups on 
small input sizes (32k rows in the sort benchmarks) but also apparent slowdowns 
on larger inputs (8M rows). This is probably a combination of
      1) allocation cost, since 16 bytes per input row are allocated for a 
`int64_t` pair
      2) increased memory footprint and decreased cached efficiency, both 
because of enlarged indices and the temporary memory area
   2. Therefore, further exploration should go towards
      1) compressing resolved indices to make them fit in 64 bits (e.g. 20 bits 
of `chunk_index`, 44 bits of `index_in_chunk`)
      2) transforming the logical indices to physical _in place_ before merging 
the chunks, and transforming them back to physical in place after merging
   
   I might dedicate some time to this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to