impothnis commented on issue #3171: URL: https://github.com/apache/parquet-java/issues/3171#issuecomment-2720190452
@wgtmac - During concurrent runs, such as loading 60 million rows across 10 writer jobs, we've noticed significant performance drops and failures. Despite configuring various flush sizes, the data doesn't seem to flush based on the set size, as observed in the created file. We've checked, and there are no memory issues or garbage collection problems. It doesn't appear that the data is kept in-memory within the client application, nor is it visible in the file during the writing process. Therefore, we want to understand where the data is stored between flushes if it isn't being flushed in chunks as expected. The data seems to be written to the file only at the end, which is likely causing a performance bottleneck. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
