I looked into the details of why the decoder could not estimate the target Arrow array size for my Parquet column. It's because I am decoding from Parquet-Dictionary to Arrow-Plain (which is the default when loading Parquet). In this case the size prediction is impossible :-(
> This would actually be the most interesting thing. In general, getting access to the pages mapped into RAM would improve in a lot of more situations, not just realloction. For example, when you take a small slice of a large array and only pass this on, but don't an explicit reference to the array, you will still indirectly hold on the larger memory size. Having an allocator that would understand the mapping between pages and memory block would allow us to free the pages that are not part of the view Not sure I'm following you on this one. From my understanding the subject here is mremap which allows you to keep your physical memory but change the virtual address range that points to it. It seems according to this ( https://stackoverflow.com/questions/11621606/faster-way-to-move-memory-page-than-mremap) that is mainly efficient for growing large allocations. Le ven. 5 juin 2020 à 16:25, Uwe L. Korn <uw...@xhochy.com> a écrit : > > On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote: > > Hi Antoine ! > > > I would indeed have expected jemalloc to do that (remap the pages) > > I have no idea about the performance gain this would provide (if any). > > Could be interesting to explore. > > This would actually be the most interesting thing. In general, getting > access to the pages mapped into RAM would improve in a lot of more > situations, not just realloction. For example, when you take a small slice > of a large array and only pass this on, but don't an explicit reference to > the array, you will still indirectly hold on the larger memory size. Having > an allocator that would understand the mapping between pages and memory > block would allow us to free the pages that are not part of the view. > > Also, yes: For CSV and JSON, we don't have size estimates beforehand. > There this would be a great performance improvement. > > Best > Uwe >