[jira] [Commented] (HDFS-14211) [Consistent Observer Reads] Allow for configurable "always msync" mode

Erik Krogen (JIRA) Wed, 13 Mar 2019 09:44:28 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791865#comment-16791865
 ]


Erik Krogen commented on HDFS-14211:
------------------------------------

I agree that there is potential issue with that [~csun], but I think allowing 
this to be configurable lets different clients make the decision of where on 
the performance vs. consistency spectrum they want to sit.

Thanks for the reviews, [~shv], I have uploaded a v001 patch making it 
configurable over a time period.

> [Consistent Observer Reads] Allow for configurable "always msync" mode
> ----------------------------------------------------------------------
>
>                 Key: HDFS-14211
>                 URL: https://issues.apache.org/jira/browse/HDFS-14211
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-14211.000.patch, HDFS-14211.001.patch
>
>
> To allow for reads to be serviced from an ObserverNode (see HDFS-12943) in a 
> consistent way, an {{msync}} API was introduced (HDFS-13688) to allow for a 
> client to fetch the latest transaction ID from the Active NN, thereby 
> ensuring that subsequent reads from the ObserverNode will be up-to-date with 
> the current state of the Active.
> Using this properly, however, requires application-side changes: for 
> examples, a NodeManager should call {{msync}} before localizing the resources 
> for a client, since it received notification of the existence of those 
> resources via communicate which is out-of-band to HDFS and thus could 
> potentially attempt to localize them prior to the availability of those 
> resources on the ObserverNode.
> Until such application-side changes can be made, which will be a longer-term 
> effort, we need to provide a mechanism for unchanged clients to utilize the 
> ObserverNode without exposing such a client to inconsistencies. This is 
> essentially phase 3 of the roadmap outlined in the [design 
> document|https://issues.apache.org/jira/secure/attachment/12915990/ConsistentReadsFromStandbyNode.pdf]
>  for HDFS-12943.
> The design document proposes some heuristics based on understanding of how 
> common applications (e.g. MR) use HDFS for resources. As an initial pass, we 
> can simply have a flag which tells a client to call {{msync}} before _every 
> single_ read operation. This may seem counterintuitive, as it turns every 
> read operation into two RPCs: {{msync}} to the Active following by an actual 
> read operation to the Observer. However, the {{msync}} operation is extremely 
> lightweight, as it does not acquire the {{FSNamesystemLock}}, and in 
> experiments we have found that this approach can easily scale to well over 
> 100,000 {{msync}} operations per second on the Active (while still servicing 
> approx. 10,000 write op/s). Combined with the fast-path edit log tailing for 
> standby/observer nodes (HDFS-13150), this "always msync" approach should 
> introduce only a few ms of extra latency to each read call.
> Below are some experimental results collected from experiments which convert 
> a normal RPC workload into one in which all read operations are turned into 
> an {{msync}}. The baseline is a workload of 1.5k write op/s and 25k read op/s.
> ||Rate Multiplier|2|4|6|8||
> ||RPC Queue Avg Time (ms)|14|53|110|125||
> ||RPC Queue NumOps Avg (k)|51|102|147|177||
> ||RPC Queue NumOps Max (k)|148|269|306|312||
> _(numbers are approximate and should be viewed primarily for their trends)_
> Results are promising up to between 4x and 6x of the baseline workload, which 
> is approx. 100-150k read op/s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14211) [Consistent Observer Reads] Allow for configurable "always msync" mode

Reply via email to