[
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDFS-16198:
-----------------------------------
Fix Version/s: 3.32
> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---------------------------------------------------------------------------
>
> Key: HDFS-16198
> URL: https://issues.apache.org/jira/browse/HDFS-16198
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Eungsop Yoo
> Assignee: Eungsop Yoo
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.32
>
> Attachments: HDFS-16198.patch, screenshot-2.png
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With
> this configuration SecretManager.InvalidToken exception may be thrown if the
> access token expires when we do short circuit reads. It doesn't matter
> because the failed reads will be retried. But it causes the leakage of
> ShortCircuitShm.Slot objects.
>
> We found this problem in our secure HBase clusters. The number of open file
> descriptors of RegionServers kept increasing using short circuit reading.
> !screenshot-2.png!
>
> It was caused by the leakage of shared memory segments used by short circuit
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590
> {code}
>
> We finally found that the root cause of this is the leakage of
> ShortCircuitShm.Slot.
>
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]