sayre1000 opened a new issue, #40798:
URL: https://github.com/apache/arrow/issues/40798

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I am trying to use the `red-parquet` and `red-arrow` gems for an archival 
tool that will batch by days. I am using Ruby on Rails.
   
   For some of our tables the data for an entire day is excessively large in 
memory (20GB) and could scale up in the future. This being the case I'd prefer 
to batch these archiving by 100,000 records.
   
   However, we still want one unified parquet file for any given day. I had 
thought there would be a way to maintain a writer to one parquet file and 
write/stream it but I have been unable to figure out how to do so.
   
   Is this something that can be done? Will I instead have to merge them after 
they are written by calling an external python script, then deleting the old 
ones?
   
   ### Component(s)
   
   Parquet, Ruby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to