[
https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667607#comment-15667607
]
Sean Mackrory commented on HDFS-10702:
--------------------------------------
{quote}My concern is that if a significant portion of read requests follow this
scenario (needs a fresher TxId), that will cause a high writeLock contention on
SbNN.{quote}
Yes this certainly isn't for every scenario. I view this as being useful for
offloading some workloads from the active NameNode. I was hoping to get some
precise measurements of how this performed relative to other HA proxy methods
for various workloads by now - but I actually found a bug where
RequestHedgingProxyProvider was broadcasting more traffic than it needed to
with > 2 NameNodes, so I'll need to revisit that.
{quote}In the case of multiple standbys, one is the checkpointer, thus you can
consider allowing client to connect to standbys not doing checkpoint.{quote}
That's a good idea - I'd certainly like to make the logic for deciding which
NameNodes are in standby more robust. Perhaps this should be included in the
'SyncInfo' structure?
{quote}After NN failover, does StaleReadProxyProvider#standbyProxies get
refreshed? If not, a long running client could keep using the old
standby.{quote}
It does not. It will reevaluate which proxies to use in the event of a failure
(specifically, a failure of the active NN when writing, or a failure of all
standby NNs when reading). I had thought about that possibility and decided to
ignore it for now. The worst that will happen is they won't be using the
optimal NameNode and you lose the benefit of the optimization. I was fine with
that since the very nature of this feature is accepting sub-optimal results
within reasonable bounds. But we could possibly add in some ability to
reevaluate after a certain time period or number of requests or something.
{quote}I am interested in knowing more how the applications plan to use it,
specifically when they will decide to call getSyncInfo. In multi tenant
environment, an application might care about specific files/directories, not
necessarily the namespace has changed at a global level.{quote}
That's an interesting idea to explore and I think it fits with the use case I
had in mind. I'm picturing cases where someone is going to be doing some
(almost entirely) read-only analytics of a dataset that is known to be complete
(or close enough). We can make the assumption that the metadata won't be
changing, and either speed up our analysis or minimize the impact of our
analysis on other workloads. In that case, I would think restricting the stale
reads to a specific subtree is perfectly reasonable (if it helps - tailing the
edit log was already implemented). I suppose this might be used by someone
wanting to search the whole filesystem for something and is okay with
approximating results. But I would think this is less common, and one could
always set '/' as the subtree they're concerned with.
> Add a Client API and Proxy Provider to enable stale read from Standby
> ---------------------------------------------------------------------
>
> Key: HDFS-10702
> URL: https://issues.apache.org/jira/browse/HDFS-10702
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Jiayi Zhou
> Assignee: Jiayi Zhou
> Priority: Minor
> Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch,
> HDFS-10702.003.patch, HDFS-10702.004.patch, HDFS-10702.005.patch,
> HDFS-10702.006.patch, StaleReadfromStandbyNN.pdf
>
>
> Currently, clients must always talk to the active NameNode when performing
> any metadata operation, which means active NameNode could be a bottleneck for
> scalability. One way to solve this problem is to send read-only operations to
> Standby NameNode. The disadvantage is that it might be a stale read.
> Here, I'm thinking of adding a Client API to enable/disable stale read from
> Standby which gives Client the power to set the staleness restriction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]