davlee1972 opened a new issue, #38427: URL: https://github.com/apache/arrow/issues/38427
### Describe the bug, including details regarding any error messages, version, and platform. We have ulimit(s) set in linux so I tried using the @max_open_file parameter in pyarrow.Dataset.write_dataset(). My python program just hangs and it seems like the write_dataset() function is not closing files so it can't open new ones.. I can see the amount of files written appears to be exactly double when @max_open_files is set to double at the point where everything hangs.. I'm also writing out partioned data in the following directory structure: parted_parquet/end_date_year=2019/end_date_month=9/end_date_day=9/flows.0.parquet with max_open_files = 256 du -h -s parted_parquet 1.3G parted_parquet with max_open_files = 128 du -h -s parted_parquet 686M parted_parquet ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
