[ 
https://issues.apache.org/jira/browse/HDDS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-4815:
----------------------------------
    Description: 
HDDS-4320 implemented unbuffer API. However, I found it changed 
BlockInputStream.close(). The close() call is supposed to close the enclosed 
ChunkInputStreams, but that logic was moved to unbuffer() and close() never 
closes ChunkInputStreams, causing socket leak.

Running a small 100GB SparkSQL TPC-DS workloads on a 3-node cluster, the test 
set couldn't complete because the processes failed with 
"java.net.BindException: Cannot assign requested address" message. Further 
investigation found the process created tens of thousands of sockets (lsof -p). 
I was finally able to pinpoint the source of leak using btrace.

  was:
HDDS-4320 implemented unbuffer API. However, I found it changed 
BlockInputStream.close(). The close() call is supposed to close the enclosed 
ChunkInputStreams, but that logic was moved to unbuffer() and close() never 
close ChunkInputStreams, causing socket leak.

Running 100GB, SparkSQL TPC-DS workloads on a small 3-node cluster, the test 
set couldn't complete because the processes failed with 
"java.net.BindException: Cannot assign requested address" message. Further 
investigation found the process created tens of thousands of sockets (lsof -p). 
I was finally able to pinpoint the source of leak using btrace.


> unbuffer caused connection leak
> -------------------------------
>
>                 Key: HDDS-4815
>                 URL: https://issues.apache.org/jira/browse/HDDS-4815
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Client
>    Affects Versions: 1.1.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Blocker
>              Labels: pull-request-available
>
> HDDS-4320 implemented unbuffer API. However, I found it changed 
> BlockInputStream.close(). The close() call is supposed to close the enclosed 
> ChunkInputStreams, but that logic was moved to unbuffer() and close() never 
> closes ChunkInputStreams, causing socket leak.
> Running a small 100GB SparkSQL TPC-DS workloads on a 3-node cluster, the test 
> set couldn't complete because the processes failed with 
> "java.net.BindException: Cannot assign requested address" message. Further 
> investigation found the process created tens of thousands of sockets (lsof 
> -p). I was finally able to pinpoint the source of leak using btrace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to