[
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674644#comment-15674644
]
Andrew Wang commented on HDFS-10702:
------------------------------------
bq. IIUC, if a client issues a read with a "stale bound" which is fresher than
the SbNN's current latest TxId, the SbNN will tail edit logs from ANN, right?
The SbNN tails from the QJM, not the ANN directly. I think the intent is also
that the client would "failover" to the ANN if the standby is too stale, rather
than making the SbNN tail synchronously.
We enabled quite up-to-date standby tailing at HDFS-10519, so this should be a
fairly rare situation as long as there are a few seconds between getSyncInfo
and the client RPCs. Write contention on the SbNN should be better than on the
ANN since applying edits is more efficient than handling a write RPC, and also
batched.
bq. In this SbNN polling approach, I am interested in knowing more how the
applications plan to use it, specifically when they will decide to call
getSyncInfo.
To expand on Sean's answer a bit, we've discussed some usecases with our Hive
and Impala teams relates to data warehousing:
* Refreshing the metadata for a table or partition is a very RPC heavy
operation. This is typically done when some new data has been written to HDFS.
So, an ingest application would write the data, call getSyncInfo, then refresh
metadata using the txid from getSyncInfo.
* For apps that do not cache input streams, they can call getSyncInfo at job
submission time, then pass this to the job's tasks. Since a couple seconds
typically passes between submission and execution, we should be able to offload
a lot from the SbNN.
* The txid from the last refresh could also be stored in the HMS, to further
offload RPCs to the related data.
> Add a Client API and Proxy Provider to enable stale read from Standby
> ---------------------------------------------------------------------
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Jiayi Zhou
> Assignee: Jiayi Zhou
> Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch,
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch,
> HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing
> any metadata operation, which means active NameNode could be a bottleneck for
> scalability. One way to solve this problem is to send read-only operations to
> Standby NameNode. The disadvantage is that it might be a stale read.
> Here, I'm thinking of adding a Client API to enable/disable stale read from
> Standby which gives Client the power to set the staleness restriction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]