ubyyj commented on issue #3636:
URL: https://github.com/apache/iceberg/issues/3636#issuecomment-1097823775

   I believe it happens because there was too many open files and close them 
too late.
   To work around the issue, you can control the thread pool size as below:
   when start spark sql
   `--conf "spark.executor.extraJavaOptions=-Diceberg.worker.num-threads=1" 
--conf "spark.driver.extraJavaOptions=-Diceberg.worker.num-threads=1"`
   
   It happens this way:
   In class ManifestFilterManager
   1, filterManifests(), invokes filterManifest() in parallel via 
ThreadPools.getWorkerPool().
   2, filterManifest() create ManifestReader in a try() block, then it invokes 
filterManifestWithDeletedFiles()
   3, filterManifestWithDeletedFiles() further create another CloseableIterable 
via reader.entries(), and it opens the .avro files, and those connections to 
.avro files will not closed until the try() block in step finishes.
   
   we can also trigger close() in filterManifestWithDeletedFiles() to mitigate 
the issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to