impothnis commented on issue #3171:
URL: https://github.com/apache/parquet-java/issues/3171#issuecomment-2720190452

   @wgtmac  - During concurrent runs, such as loading 60 million rows across 10 
writer jobs, we've noticed significant performance drops and failures. Despite 
configuring various flush sizes, the data doesn't seem to flush based on the 
set size, as observed in the created file.
   
   
   We've checked, and there are no memory issues or garbage collection 
problems. It doesn't appear that the data is kept in-memory within the client 
application, nor is it visible in the file during the writing process. 
Therefore, we want to understand where the data is stored between flushes if it 
isn't being flushed in chunks as expected.
   
   
   The data seems to be written to the file only at the end, which is likely 
causing a performance bottleneck.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to