[
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517672#comment-16517672
]
Erik Krogen commented on HDFS-13609:
------------------------------------
Thanks for the review [~shv]!
# I agree that this would be much cleaner. In many cases, {{inProgressOk}} will
be equivalent to {{optimizeLatency}}. However there are a few cases where this
is not currently true:
** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm
that no one else is writing new transactions. It seems fine to allow this to
use the RPC mechanism.
** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a
range of transaction IDs are available. Seems fine to allow this to use the RPC
mechanism.
** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions
for INotify feature. Seems fine (actually, seems desirable) to let this use the
RPC mechanism. However, on a slightly unrelated note, one portion of this will
need to be changed to work properly in a read-from-standby environment... Filed
HDFS-13689 for this.
** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on
{{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the
NameNode. I don't think it's necessary to optimize for this situation.
** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy
things and making assumptions about the streams returned that will not be true
when using the RPC mechanism. We need to prevent this from using the RPC
mechanism, but given that this is only for the BackupNode, I recommend we avoid
adding a new API / parameter just for this situation and disable the RPC
mechanism on the BackupNode entirely. I instead propose that we add a way for
the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could
take the form of an undocumented config parameter, or, my preference, add a
static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the
BackupNode can call.
If you agree that we can handle {{BackupImage}} as I described, I think I can
remove this new parameter and limit the scope of the change.
# Agreed. I will fix this in the next patch.
# I thought more about why an operator might want to change this config. I
determined that I can imagine situations when I would want to increase it, if
the situation arises that RPC response time from the JournalNodes is high and
the number of transactions per second is very high (say, a very high write
workload). But I can't think of a reason to lower it; this is more about just
setting a sanity-check upper bound. This makes me think we should (a) raise the
default limit to 5000 -> even with a RTT RPC time of 100ms, which is quite
high, this would allow 50k transactions per second, (b) make it undocumented as
you described. I will incorporate this into the next patch.
> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via
> RPC
> ---------------------------------------------------------------------------------
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, namenode
> Reporter: Erik Krogen
> Assignee: Erik Krogen
> Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch,
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are
> in the QuorumJournalManager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]