[ 
https://issues.apache.org/jira/browse/HDFS-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6574:
----------------------------

    Description: 
My colleague [~cuijianwei] found in a HBase testing scenario, once a bad disk 
occured, the local read will be skipped and lots of remote reads be requested 
for a lengthy time, say, tens of minutes, then we had to trigger a compaction 
to help recovering the locality and read latency.
It turned out relating with the addToDeadNodes(), imaging a disk in local node 
has something wrong, current impl will add the entity local node to the dead 
node list, then all other good disks in local node could not get read request 
any more.
So better choices here to me, seems:
1) tell the detail IOException really is a connection related exception, then 
call addToDeadNodes().  or
2) tell the IOException is related with bad block/disk, w/o call 
addToDeadNodes(); else call addToDeadNodes().

another thing need to consider is if we have got a disk exception from one 
node, should we refresh the locatedBlocks info from nn to clear all rotten 
caching for that bad disk of the node ?  it'll be heavy somehow if it's a huge 
size file...

We have a plan to make a patch soon for our internal hadoop branch, due to 
it'll degrade HBase read performance severely once a sick disk ocurred, also 
we'd like to contribute to community if you think this is not too crazy...   
[~stack]

  was:
My colleague [~cuijianwei] found in a HBase testing scenario, once a bad disk 
occured, the local read will be skipped and lots of remote read be requested 
for a lengthy time, say, tens of minutes, then we had to trigger a compaction 
to help recovering the locality and read latency.
It turned out relating with the addToDeadNodes(), imaging one disk in local 
node has something wrong, current impl will add the local node to the dead node 
list, then all other good disks in local node could not service any read 
request.
So better chooses here to me, seems:
1) tell the detail IOException really is a connection related exception, then 
call addToDeadNodes().
2) tell the IOException is related with bad block/disk, w/o call 
addToDeadNodes(); else call addToDeadNodes().

another thing need to consider is if we have got a disk exception from one 
node, should we refresh the locatedBlocks info from nn to clear all rotten 
caching for that bad disk of the node ?  it'll be heavy somehow if it's a huge 
size file...

I have a plan to make a patch for our internal hadoop branch, due to it'll 
degrade HBase read performance severely, also i'd like to contribute to 
community if you think this proposal is not too crazy...   [~stack]


> make sure addToDeadNodes() only be called when there's a connection issue 
> occur
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-6574
>                 URL: https://issues.apache.org/jira/browse/HDFS-6574
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0, 2.5.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>
> My colleague [~cuijianwei] found in a HBase testing scenario, once a bad disk 
> occured, the local read will be skipped and lots of remote reads be requested 
> for a lengthy time, say, tens of minutes, then we had to trigger a compaction 
> to help recovering the locality and read latency.
> It turned out relating with the addToDeadNodes(), imaging a disk in local 
> node has something wrong, current impl will add the entity local node to the 
> dead node list, then all other good disks in local node could not get read 
> request any more.
> So better choices here to me, seems:
> 1) tell the detail IOException really is a connection related exception, then 
> call addToDeadNodes().  or
> 2) tell the IOException is related with bad block/disk, w/o call 
> addToDeadNodes(); else call addToDeadNodes().
> another thing need to consider is if we have got a disk exception from one 
> node, should we refresh the locatedBlocks info from nn to clear all rotten 
> caching for that bad disk of the node ?  it'll be heavy somehow if it's a 
> huge size file...
> We have a plan to make a patch soon for our internal hadoop branch, due to 
> it'll degrade HBase read performance severely once a sick disk ocurred, also 
> we'd like to contribute to community if you think this is not too crazy...   
> [~stack]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to