[ 
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eungsop Yoo updated HDFS-16198:
-------------------------------
    Description: 
In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With this 
configuration SecretManager.InvalidToken exception may be thrown if the access 
token expires when we do short circuit reads. It doesn't matter because the 
failed reads will be retried. But it causes the leakage of ShortCircuitShm.Slot 
objects. 

 

We found this problem in our secure HBase clusters. The number of open file 
descriptors of RegionServers kept increasing using short circuit reading. 

!screenshot-2.png!

 

It was caused by the leakage of shared memory segments used by short circuit 
reading.
{code:java}
[root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
'{print $2}') | grep /dev/shm | wc -l
3925
[root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
'{print $2}') | grep /dev/shm | head -5
java 86309 hbase DEL REG 0,19 2308279984 
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
java 86309 hbase DEL REG 0,19 2306359893 
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
java 86309 hbase DEL REG 0,19 2305496758 
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
java 86309 hbase DEL REG 0,19 2304784261 
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
java 86309 hbase DEL REG 0,19 2302621988 
/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
{code}
 

We finally found that the root cause of this is the leakage of 
ShortCircuitShm.Slot.

 

The fix is trivial. Just free the slot when InvalidToken exception is thrown.

  was:
In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With this 
configuration SecretManager.InvalidToken exception may be thrown if the access 
token expires when we do short circuit reads. It doesn't matter because the 
failed reads will be retried. But it causes the leakage of ShortCircuitShm.Slot 
objects. We found this problem in our secure HBase clusters.
 !screenshot-2.png! 

The fix is trivial. Just free the slot when InvalidToken exception is thrown.


> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-16198
>                 URL: https://issues.apache.org/jira/browse/HDFS-16198
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Eungsop Yoo
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-16198.patch, screenshot-2.png
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With 
> this configuration SecretManager.InvalidToken exception may be thrown if the 
> access token expires when we do short circuit reads. It doesn't matter 
> because the failed reads will be retried. But it causes the leakage of 
> ShortCircuitShm.Slot objects. 
>  
> We found this problem in our secure HBase clusters. The number of open file 
> descriptors of RegionServers kept increasing using short circuit reading. 
> !screenshot-2.png!
>  
> It was caused by the leakage of shared memory segments used by short circuit 
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
> {code}
>  
> We finally found that the root cause of this is the leakage of 
> ShortCircuitShm.Slot.
>  
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to