[ 
https://issues.apache.org/jira/browse/HUDI-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7392:
---------------------------------
    Labels: pull-request-available  (was: )

> Fix connection leak causing lingering CLOSE_WAIT TCP connections
> ----------------------------------------------------------------
>
>                 Key: HUDI-7392
>                 URL: https://issues.apache.org/jira/browse/HUDI-7392
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: voon
>            Assignee: voon
>            Priority: Major
>              Labels: pull-request-available
>
> When consistent_hashing is enabled and a long running Spark job 
> (Deltastreamer) is created, we noticed that there is a gradual increase in 
> CLOSE_WAIT connections originating from the AM -> HDFS DN. 
>  
> Command to check for close waits
> {code:java}
> netstat -anlpt | grep CLOSE_WAIT | grep 50010{code}
> Result
> {code:java}
> tcp6       1      0 10.1.2.3:45994      10.5.4.3:50010      CLOSE_WAIT  
> 2446/java
> tcp6       1      0 10.1.2.3:48478      10.6.5.4:50010      CLOSE_WAIT  
> 2446/java
> tcp6       1      0 10.1.2.3:49542      10.7.6.5:50010      CLOSE_WAIT  
> 2446/java
> tcp6       1      0 10.1.2.3:47220      10.8.7.6:50010      CLOSE_WAIT  
> 2446/java
> tcp6       1      0 10.1.2.3:49786      10.9.8.7:50010      CLOSE_WAIT  
> 2446/java {code}
>  
> To reproduce this:
>  
> {code:java}
> CREATE TABLE dev_hudi.close_wait_issue_investigation (
>     id INT,
>     name STRING,
>     date_col STRING,
>     grass_region STRING
> ) USING hudi
> PARTITIONED BY (grass_region)
> tblproperties (
>     primaryKey = 'id',
>     type = 'mor',
>     precombineField = 'id',
>     hoodie.index.type = 'BUCKET',
>     hoodie.index.bucket.engine = 'CONSISTENT_HASHING',     
>     hoodie.compact.inline = 'true'
> )
> LOCATION 'hdfs://DEV/close_wait_issue_investigation';
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (1, 'alex1', 
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (2, 'alex2', 
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (3, 'alex3', 
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (4, 'alex4', 
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (5, 'alex5', 
> '2023-12-22', 'SG');{code}
>  
>  Observation:
> After every INSERT, there will be 1 new CLOSE_WAIT.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to