pitrou commented on PR #41700:
URL: https://github.com/apache/arrow/pull/41700#issuecomment-2535452180

   By building on this (arguably simplified) analysis:
   
   > we're trading the concatenation of the chunked values (essentially 
allocating a new values array) against the resolution of many chunked indices 
(essentially allocating two new indices arrays). This is only beneficial if the 
value width is quite large (say a 256-byte FSB) or the number of indices is 
much smaller than the number of values.
   
   and assuming the following known values:
   * `n_values`: length of the values input
   * `n_indices`: length of the indices input (governing the output length)
   * `value_width`: byte width of the individual values
   
   Then a simple heuristic could be to concatenate iff `n_indices * 16 > 
n_values * value_width`. This wouldn't take into account the larger 
computational cost associated with chunked indexing, but at least it would 
disable the chunked resolution approach when it doesn't make sense at all.
   
   (btw, a moderate improvement could probably be achieved by using 
`CompressedChunkLocation`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to