devinjdangelo opened a new pull request, #9605:
URL: https://github.com/apache/arrow-datafusion/pull/9605

   ## Which issue does this PR close?
   
   related to #9493
   
   ## Rationale for this change
   
   Serialization is CPU intensive and could starve the object store writer 
futures resulting in failed/timed out writes.
   
   I believe that all parts of parallel parquet writing apart from 
`concatenate_parallel_row_groups` (which does the object store put) could be 
moved to a sync/blocking thread. However, I think serialization is the only one 
CPU intensive enough between calls to `.await` to potentially cause issues.
   
   ## What changes are included in this PR?
   
   Moves parquet column serialization to blocking threads with spawn_blocking 
method.
   
   ## Are these changes tested?
   
   Yes, by existing tests
   
   ## Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to