Rachelint commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2890735910
>I wonder what happens if we make it more like at least 1 million or 1MiB so
the effect on cache-friendliness is smaller?
We could optimize a growing strategy for the first allocated Vec if memory
usage / overhead of first block is a concern.
> I think we should try to minimize the impact of this on low-cardinality
cases (e.g. make sure they fit in one array, minimize the overhead of blocks)...
If I don't misunderstand, does it mean strategy like that:
- We make the block size large enough
- For the first block, we still perform `resizing` at firstly
- But after it grow large enough, we switch to `blocked approach`?
> Yeah it is quite efficient, although problematic for large inputs
Agree. It also leads to large memory usage, because we only release memory
after all the batches are returned(we hold the really single batch, and only
return slice of it now).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]