[
https://issues.apache.org/jira/browse/HUDI-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-7392:
---------------------------------
Labels: pull-request-available (was: )
> Fix connection leak causing lingering CLOSE_WAIT TCP connections
> ----------------------------------------------------------------
>
> Key: HUDI-7392
> URL: https://issues.apache.org/jira/browse/HUDI-7392
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: voon
> Assignee: voon
> Priority: Major
> Labels: pull-request-available
>
> When consistent_hashing is enabled and a long running Spark job
> (Deltastreamer) is created, we noticed that there is a gradual increase in
> CLOSE_WAIT connections originating from the AM -> HDFS DN.
>
> Command to check for close waits
> {code:java}
> netstat -anlpt | grep CLOSE_WAIT | grep 50010{code}
> Result
> {code:java}
> tcp6 1 0 10.1.2.3:45994 10.5.4.3:50010 CLOSE_WAIT
> 2446/java
> tcp6 1 0 10.1.2.3:48478 10.6.5.4:50010 CLOSE_WAIT
> 2446/java
> tcp6 1 0 10.1.2.3:49542 10.7.6.5:50010 CLOSE_WAIT
> 2446/java
> tcp6 1 0 10.1.2.3:47220 10.8.7.6:50010 CLOSE_WAIT
> 2446/java
> tcp6 1 0 10.1.2.3:49786 10.9.8.7:50010 CLOSE_WAIT
> 2446/java {code}
>
> To reproduce this:
>
> {code:java}
> CREATE TABLE dev_hudi.close_wait_issue_investigation (
> id INT,
> name STRING,
> date_col STRING,
> grass_region STRING
> ) USING hudi
> PARTITIONED BY (grass_region)
> tblproperties (
> primaryKey = 'id',
> type = 'mor',
> precombineField = 'id',
> hoodie.index.type = 'BUCKET',
> hoodie.index.bucket.engine = 'CONSISTENT_HASHING',
> hoodie.compact.inline = 'true'
> )
> LOCATION 'hdfs://DEV/close_wait_issue_investigation';
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (1, 'alex1',
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (2, 'alex2',
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (3, 'alex3',
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (4, 'alex4',
> '2023-12-22', 'SG');
> INSERT INTO dev_hudi.close_wait_issue_investigation VALUES (5, 'alex5',
> '2023-12-22', 'SG');{code}
>
> Observation:
> After every INSERT, there will be 1 new CLOSE_WAIT.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)