Re: [I] [C++] Metadata related memory leak when reading parquet dataset [arrow]

via GitHub Mon, 27 Jan 2025 09:27:43 -0800


timothydijamco commented on issue #45287:
URL: https://github.com/apache/arrow/issues/45287#issuecomment-2616422607


   Awesome, thanks. Can you point me to how you were able to tell 250MB was 
spent on Column Chunk metadata using the memory pool statistics debugging? I 
think I was getting only high-level summary statistics with `PrintStats()`.
   
   I think I may be seeing what you're saying about column chunk metadata in 
the output that I see when running `valgrind --tool=massif` on the C++ repro I 
posted above and visualizing using `massif-visualizer`. Here's what the memory 
usage graph looks like when running the two scans in one script:
   <img width="1314" alt="Image" 
src="https://github.com/user-attachments/assets/5f3e600d-dce6-4524-baad-568c3201300d";
 />
   
   At the peak (middle of the graph), the top three things using memory seem to 
be:
   * 342.9MiB: Some "`parquet::schema::node` to 'schema field'" map
       * <img width="1148" alt="Image" 
src="https://github.com/user-attachments/assets/c95b98d6-9632-4b27-a3d5-3fdf4034dda6";
 />
   * 155.4 MiB: Some "name to index" map
       * <img width="1143" alt="Image" 
src="https://github.com/user-attachments/assets/6fd5c862-fa23-4d9b-9367-91b5f7d1a7c9";
 />
   * 109.9 MiB: Vector of `parquet::format::ColumnChunk`s
       * <img width="1139" alt="Image" 
src="https://github.com/user-attachments/assets/9980b3ab-fd91-4274-b4b3-b0ffb437fb00";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++] Metadata related memory leak when reading parquet dataset [arrow]

Reply via email to