alamb commented on issue #11042: URL: https://github.com/apache/datafusion/issues/11042#issuecomment-2232995071
> I also found https://github.com/apache/arrow-rs/issues/5828 which might be related and/or relevant. I would expect that the memory usage hightlighted in https://github.com/apache/arrow-rs/issues/5828 would be directly reduced by setting the `data_page_row_limit`. > After disabling it I see the memory increasing only marginally for every invocation (in the 100-200MB range) while with DICTIONARY_ENABLED true each invocation increases the memory usage in multiple GBs (2-3GB) and it seems it never gets freed again. I wonder if this could be related to DataFusion overriding the `data_page_row_limit` setting in https://github.com/apache/datafusion/issues/11367 (that @wiedld is working on) I think you can set this option like ```sql COPY (SELECT col1, timestamp, col10, col12 FROM my_table ORDER BY col1 ASC, timestamp ASC) TO './output' STORED AS PARQUET PARTITIONED BY (col1) OPTIONS ( compression 'uncompressed', 'format.parquet.data_pagesize_limit' 20000 ); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
