[I] pyarrow write_data_set max_open_files not closing files [arrow]

via GitHub Mon, 23 Oct 2023 14:50:31 -0700


davlee1972 opened a new issue, #38427:
URL: https://github.com/apache/arrow/issues/38427


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   We have ulimit(s) set in linux so I tried using the @max_open_file parameter 
in pyarrow.Dataset.write_dataset().
   
   My python program just hangs and it seems like the write_dataset() function 
is not closing files so it can't open new ones..
   I can see the amount of files written appears to be exactly double when 
@max_open_files is set to double at the point where everything hangs..
   
   I'm also writing out partioned data in the following directory structure:
   
parted_parquet/end_date_year=2019/end_date_month=9/end_date_day=9/flows.0.parquet
   
   with max_open_files = 256
   du -h -s parted_parquet
   1.3G    parted_parquet
   
   with max_open_files = 128
   du -h -s parted_parquet
   686M    parted_parquet
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] pyarrow write_data_set max_open_files not closing files [arrow]

Reply via email to