pravin1406 opened a new issue, #9620: URL: https://github.com/apache/hudi/issues/9620
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** We were facing s3 connection leakages on reading hudi table (MOR) using spark. This only happens in case of using limit in spark. We were able to read complete and write to another location successfully but simple " select * from table_rt limit 2" fails with Timeout Error . Image Below <img width="1189" alt="Screenshot 2023-09-06 at 3 42 05 AM" src="https://github.com/apache/hudi/assets/25177655/c662a06e-a071-4667-be11-2ba426acfe30"> The tcp Connections to s3 where stuck in CLOSE_WAIT state, which meant after reaching max s3 connection limit, other connection wont be able to connect leading to timeout. On Tracing the code in HoodieMergeOnReadRDD.scala, we found the below condition to be the root cause of leakage. Here the condition returns false as the iterator created in any case was not instance of Closeable. I was able to reproduce it in 0.12.2 and 0.14.0-rc1. if (iter.isInstanceOf[Closeable]) { // register a callback to close logScanner which will be executed on task completion. // when tasks finished, this method will be called, and release resources. Option(TaskContext.get()).foreach(_.addTaskCompletionListener[Unit](_ => iter.asInstanceOf[Closeable].close())) } <img width="1440" alt="Screenshot 2023-09-06 at 3 37 15 AM" src="https://github.com/apache/hudi/assets/25177655/91d15411-40f5-48e5-acf6-a71ae51ea6bc"> I'm aware of this PR #9477 , which tried to fix some leakages but not sure it did fix it. I was able to handle the baseFileIterator case, but not in the right way i believe. Can we please look into it ? A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. create Any hudi table on s3 2. read using spark shell which fs.s3a.connection.maximum set to a low value (so we hit bug fast) 3. Run a limit query 4. Run netstat -natp | grep s3 {port} on any of the executor to check tcp connection stuck in CLOSE_WAIT. Let me know if we need any more artifacts for proof **Environment Description** * Hudi version : 0.12.2 and 0.14.0-rc1 * Spark version : 3.2.0 * Hive version : 3.1.2 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
