rustyconover commented on issue #6868: URL: https://github.com/apache/iceberg/issues/6868#issuecomment-1550601530
I'm also seeing this behavior. I have 10 manifest files each 8mb in size. It seems that there is a lot of contention for Python's GIL across all of the threads. It may be better to use a ProcessPool rather than a thread pool to do the decoding of the Avro file. That way there wouldn't be contention around the GIL lock and the result can be easily serialized back to the calling function. If I have time I will build a comparison between ThreadPool and ProcessPool based loading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
