[
https://issues.apache.org/jira/browse/HDFS-12204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145244#comment-16145244
]
HanRyong,Jung commented on HDFS-12204:
--------------------------------------
RefCount of Hdfs ShortCircuitReplica has an initial value of 2
That's because one is ShortCircuitCache, and one is HDFS BlockReaderLocal.
The problem I found here is that both hdfs and hbase need to be modified.
First, the ShortCircuitCacheCleaner of hdfs-client reports only the expireTime
to purge(delete) the cache
However, ShortCircuitReplica has a Slot and I need the code to Pugrge (delete)
it via Slot.
Secondly, It is lazy to check the status of HDFS client BlockReaderLocal in
hbase.
So even if you purged the cache in ShortCircuitCacheCleaner, the refCount of
the hdfs client is fixed to 1 if there is no access to the hfile.
I need to periodically check and close BlockReaderLocal on the HDFS client in
Hbase.
I have added the following code to ShortCircuitCacheCleaner to solve this
problem.
This solution is only available in specific Application(hbase) and is a very
temporary fix.
This is because the hbase client retries if there is an error.
{code:java}
public class ShortCircuitCache implements Closeable {
...
private class CacheCleaner implements Runnable, Closeable {
...
public void run() {
...
if (LOG.isDebugEnabled()) {
LOG.debug(this + ": cache cleaner
running at " + curMs);
}
purgeStaleReplica();
int numDemoted =
demoteOldEvictableMmaped(curMs);
...
}
private void purgeStaleReplica() {
ArrayList<Waitable<ShortCircuitReplicaInfo>> lists =
Lists.newArrayList(replicaInfoMap.values());
for (Waitable<ShortCircuitReplicaInfo> i : lists) {
ShortCircuitReplica replica =
i.getVal().getReplica();
if (replica.isStale()) {
purge(replica);
//In fact, BlockReaderLocal should be
closed in Client, but in hbase, it works because hbase client retries.
while (replica.refCount > 0) {
unref(replica);
}
}
}
}
...
}
...
}
{code}
> Dfsclient Do not close file descriptor when using shortcircuit
> --------------------------------------------------------------
>
> Key: HDFS-12204
> URL: https://issues.apache.org/jira/browse/HDFS-12204
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.7.3
> Environment: HDFS 2.7.3, HBASE 1.2.6, centOS 6.8
> Reporter: HanRyong,Jung
>
> I am a user using HDFS 2.7.3, HBASE 1.2.6, centOS 6.8.
> The regionserver uses 11 hard disks(jbod) and uses the hbase short circuit.
> At this time, when one disk fails in HDFS, I found a phenomenon that I did a
> hotswap but did not close file descriptor in hbase.
> And the fd path on the umount disk is changed to an incorrect path.
> If I check /proc/regionserver_pid/fd, if I used /data1/volumn and umounted
> data1, the path changed to /volumn.
> And many file descriptors used in shortcircuit are in the delete state.
> example )
> ls -al /proc/regionserver_pid/fd
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 946 ->
> /data8/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir21/blk_1215239490
> (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 947 ->
> /data8/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir21/blk_1215239490_141511919.meta
> (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 948 ->
> /data7/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir27/blk_1215241080
> (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 949 ->
> /data7/volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir111/subdir27/blk_1215241080_141513509.meta
> (deleted)
> lr-x------ 1 dragonboy dragonboy 64 2017-07-26 20:54 *192 ->
> /volumn/hdfs/datanode/current/BP-199986352-10.114.243.73-1490077615453/current/finalized/subdir244/subdir160/blk_1257545757
> (deleted)*
> .
> .
> .
> .
>
> when data4 fails, execute fuser)
> /sbin/fuser -cu /data4
> Cannot stat file /proc/regionserver_pid/fd/*192*: input/output error
> Cannot stat file /proc/regionserver_pid/fd/1282: input/output error
> Cannot stat file /proc/regionserver_pid/fd/1283: input/output error
> .
> .
> .
> .
> .
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]