[ 
https://issues.apache.org/jira/browse/HBASE-28666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-28666:
------------------------------
    Fix Version/s: 2.7.0
                   3.0.0-beta-2
                   2.6.1
     Hadoop Flags: Reviewed
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

Pushed to branch-2.6+.

Thanks [~charlesconnell] for contributing!

> Dropping unclosed WALTailingReaders leads to leaked sockets
> -----------------------------------------------------------
>
>                 Key: HBASE-28666
>                 URL: https://issues.apache.org/jira/browse/HBASE-28666
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication, wal
>    Affects Versions: 2.6.0
>            Reporter: Charles Connell
>            Assignee: Charles Connell
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1
>
>
> {{WALEntryStream#prepareReader()}} will, in some cases, reach [the 
> line|https://github.com/apache/hbase/blob/ba15d67a350adb11ae1d4c44d214216406ae0b5a/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L258]
> {code}
> reader = WALFactory.createTailingReader(fs, nextPath, conf, 
> currentPositionOfEntry > 0 ? currentPositionOfEntry : -1);
> {code}
> when {{reader}} is non-null. In this case, the old object pointed to by 
> {{reader}} becomes un-referenced and is garbage-collected. However, that 
> object was never closed.
> At Hubspot we see the effects of this when doing tests that use inter-cluster 
> replication. Machines in the source cluster experience a build-up of sockets. 
> Eventually this causes the machine to run out of TCP kernel memory and start 
> dropping packets. The only workaround currently is to restart the 
> RegionServer process.
> I have found that simply putting
> {code}
> closeReader();
> {code}
> immediately before the line quoted above appears to resolve the issue and 
> causes no obvious problems. However, I'm still developing a proper test for 
> this fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to