Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2890735910
>I wonder what happens if we make it more like at least 1 million or 1MiB so the effect on cache-friendliness is smaller? We could optimize a growing strategy for the first allocated Vec if memory usage / overhead of first block is a concern. > I think we should try to minimize the impact of this on low-cardinality cases (e.g. make sure they fit in one array, minimize the overhead of blocks)... If I don't misunderstand, does it mean strategy like that: - We make the block size large enough - For the first block, we still perform `resizing` at firstly - But after it grow large enough, we switch to `blocked approach`? > Yeah it is quite efficient, although problematic for large inputs Agree. It also leads to large memory usage, because we only release memory after all the batches are returned(we hold the really single batch, and only return slice of it now). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org