[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

Andrew Wang (JIRA) Thu, 17 Nov 2016 11:54:07 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674644#comment-15674644
 ]


Andrew Wang commented on HDFS-10702:
------------------------------------

bq. IIUC, if a client issues a read with a "stale bound" which is fresher than 
the SbNN's current latest TxId, the SbNN will tail edit logs from ANN, right?

The SbNN tails from the QJM, not the ANN directly. I think the intent is also 
that the client would "failover" to the ANN if the standby is too stale, rather 
than making the SbNN tail synchronously.

We enabled quite up-to-date standby tailing at HDFS-10519, so this should be a 
fairly rare situation as long as there are a few seconds between getSyncInfo 
and the client RPCs. Write contention on the SbNN should be better than on the 
ANN since applying edits is more efficient than handling a write RPC, and also 
batched.

bq. In this SbNN polling approach, I am interested in knowing more how the 
applications plan to use it, specifically when they will decide to call 
getSyncInfo. 

To expand on Sean's answer a bit, we've discussed some usecases with our Hive 
and Impala teams relates to data warehousing:

* Refreshing the metadata for a table or partition is a very RPC heavy 
operation. This is typically done when some new data has been written to HDFS. 
So, an ingest application would write the data, call getSyncInfo, then refresh 
metadata using the txid from getSyncInfo.
* For apps that do not cache input streams, they can call getSyncInfo at job 
submission time, then pass this to the job's tasks. Since a couple seconds 
typically passes between submission and execution, we should be able to offload 
a lot from the SbNN.
* The txid from the last refresh could also be stored in the HMS, to further 
offload RPCs to the related data.

> Add a Client API and Proxy Provider to enable stale read from Standby
> ---------------------------------------------------------------------
>
>                 Key: HDFS-10702
>                 URL: https://issues.apache.org/jira/browse/HDFS-10702
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jiayi Zhou
>            Assignee: Jiayi Zhou
>            Priority: Minor
>         Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, 
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch, 
> HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing 
> any metadata operation, which means active NameNode could be a bottleneck for 
> scalability. One way to solve this problem is to send read-only operations to 
> Standby NameNode. The disadvantage is that it might be a stale read. 
> Here, I'm thinking of adding a Client API to enable/disable stale read from 
> Standby which gives Client the power to set the staleness restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby

Reply via email to