[ 
https://issues.apache.org/jira/browse/HDFS-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bijaya updated HDFS-8955:
-------------------------
    Release Note: 
If a write from a block is slow, start up another parallel, 'hedged' write 
against a different set of replica. We need to get different set of 
replica/data pipeline from NN.  We then take the result of which ever write 
returns first (the outstanding write is cancelled).  This 'hedged' write 
feature will help rein in the outliers, the odd write that takes a long time 
because it hit a bad patch on the disc, etc.

This feature is off by default.  To enable this feature, set 
<code>dfs.client.hedged.write.threadpool.size</code> to a positive number.  The 
threadpool size is how many threads to dedicate to the running of these 
'hedged', concurrent writes in your client.

Then set <code>dfs.client.hedged.write.threshold.millis</code> to the number of 
milliseconds to wait before starting up a 'hedged' write.  For example, if you 
set this property to 10, then if a write has not returned within 10 
milliseconds, we will start up a new read against a different block replica.

This feature emits new metrics:

+ hedgedWriteOps
+ hedgeWriteOpsWin -- how many times the hedged write 'beat' the original write
+ hedgedWriteOpsInCurThread -- how many times we went to do a hedged write but 
we had to run it in the current thread because 
dfs.client.hedged.write.threadpool.size was at a maximum.

  was:
If a read from a block is slow, start up another parallel, 'hedged' read 
against a different block replica.  We then take the result of which ever read 
returns first (the outstanding read is cancelled).  This 'hedged' read feature 
will help rein in the outliers, the odd read that takes a long time because it 
hit a bad patch on the disc, etc.

This feature is off by default.  To enable this feature, set 
<code>dfs.client.hedged.read.threadpool.size</code> to a positive number.  The 
threadpool size is how many threads to dedicate to the running of these 
'hedged', concurrent reads in your client.

Then set <code>dfs.client.hedged.read.threshold.millis</code> to the number of 
milliseconds to wait before starting up a 'hedged' read.  For example, if you 
set this property to 10, then if a read has not returned within 10 
milliseconds, we will start up a new read against a different block replica.

This feature emits new metrics:

+ hedgedReadOps
+ hedgeReadOpsWin -- how many times the hedged read 'beat' the original read
+ hedgedReadOpsInCurThread -- how many times we went to do a hedged read but we 
had to run it in the current thread because 
dfs.client.hedged.read.threadpool.size was at a maximum.


> Support 'hedged' write in DFSClient
> -----------------------------------
>
>                 Key: HDFS-8955
>                 URL: https://issues.apache.org/jira/browse/HDFS-8955
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: bijaya
>            Assignee: bijaya
>             Fix For: 2.4.0
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to