[GitHub] [hudi] pravin1406 opened a new issue, #9620: [SUPPORT] Connection Leak Possible Bug in hudi 0.14.0-rc1 and previous hudi versions

via GitHub Tue, 05 Sep 2023 15:22:57 -0700


pravin1406 opened a new issue, #9620:
URL: https://github.com/apache/hudi/issues/9620


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   We were facing s3 connection leakages on reading hudi table (MOR) using 
spark. This only happens in case of using limit in spark.
   We were able to read complete and write to another location successfully but 
simple " select * from table_rt limit 2" fails with Timeout Error . Image Below
   <img width="1189" alt="Screenshot 2023-09-06 at 3 42 05 AM" 
src="https://github.com/apache/hudi/assets/25177655/c662a06e-a071-4667-be11-2ba426acfe30";>
   
   The tcp Connections to s3 where stuck in CLOSE_WAIT state, which meant after 
reaching max s3 connection limit, other connection wont be able to connect 
leading to timeout.
   
   On Tracing the code in HoodieMergeOnReadRDD.scala, we found the below 
condition to be the root cause of leakage. Here the condition returns false as 
the iterator created in any case was not instance of Closeable. I was able to 
reproduce it in 0.12.2 and 0.14.0-rc1.
   
   
   
    if (iter.isInstanceOf[Closeable]) {
         // register a callback to close logScanner which will be executed on 
task completion.
         // when tasks finished, this method will be called, and release 
resources.
         Option(TaskContext.get()).foreach(_.addTaskCompletionListener[Unit](_ 
=> iter.asInstanceOf[Closeable].close()))
       }
   
   <img width="1440" alt="Screenshot 2023-09-06 at 3 37 15 AM" 
src="https://github.com/apache/hudi/assets/25177655/91d15411-40f5-48e5-acf6-a71ae51ea6bc";>
   
   
   I'm aware of this PR #9477 , which tried to fix some leakages but not sure 
it did fix it. I was able to handle the baseFileIterator case, but not in the 
right way i believe. Can we please look into it ? 
   
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. create Any hudi table on s3
   2. read using spark shell which fs.s3a.connection.maximum set to a low value 
(so we hit bug fast)
   3. Run a limit query
   4. Run netstat -natp  | grep s3 {port} on any of the executor to check tcp 
connection stuck in CLOSE_WAIT.
   
   Let me know if we need any more artifacts for proof
   
   **Environment Description**
   
   * Hudi version : 0.12.2 and 0.14.0-rc1
   
   * Spark version : 3.2.0
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] pravin1406 opened a new issue, #9620: [SUPPORT] Connection Leak Possible Bug in hudi 0.14.0-rc1 and previous hudi versions

Reply via email to