[GitHub] [arrow] jorisvandenbossche commented on issue #34274: [Python] High RAM usage when reading parquet.

via GitHub Tue, 28 Feb 2023 02:27:59 -0800


jorisvandenbossche commented on issue #34274:
URL: https://github.com/apache/arrow/issues/34274#issuecomment-1447933460


   > I used `dataframe.info()`, which said the memory usage is `10G+`
   
   If you have object dtype columns (for example string columns), this can be a 
large under-estimation. You can pass `dataframe.info(memory_usage="deep")` to 
get the full memory usage for the pandas.DataFrame.
   
   This can also be more than the memory usage you see for the pyarrow.Table 
(using `table.nbytes` as Weston mentioned above), since for some data types 
(such as strings), pandas is less efficient compared to pyarrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on issue #34274: [Python] High RAM usage when reading parquet.

Reply via email to