On Thu, 4 Jun 2020 17:48:16 +0200
Rémi Dettai <rdet...@gmail.com> wrote:
> When creating large arrays, Arrow uses realloc quite intensively.
> 
> I have an example where y read a gzipped parquet column (strings) that
> expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc
> cannot anticipate this and every reallocate call above 1MB (the most
> critical ones) ends up being a copy.

Ideally, we should be able to presize the array to a good enough
estimate. I don't know if Parquet gives us enough information for that,
though.

Regards

Antoine.


Reply via email to