[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517672#comment-16517672
 ] 

Erik Krogen commented on HDFS-13609:
------------------------------------

Thanks for the review [~shv]!
# I agree that this would be much cleaner. In many cases, {{inProgressOk}} will 
be equivalent to {{optimizeLatency}}. However there are a few cases where this 
is not currently true: 
** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm 
that no one else is writing new transactions. It seems fine to allow this to 
use the RPC mechanism.
** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a 
range of transaction IDs are available. Seems fine to allow this to use the RPC 
mechanism.
** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions 
for INotify feature. Seems fine (actually, seems desirable) to let this use the 
RPC mechanism. However, on a slightly unrelated note, one portion of this will 
need to be changed to work properly in a read-from-standby environment... Filed 
HDFS-13689 for this.
** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on 
{{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the 
NameNode. I don't think it's necessary to optimize for this situation.
** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy 
things and making assumptions about the streams returned that will not be true 
when using the RPC mechanism. We need to prevent this from using the RPC 
mechanism, but given that this is only for the BackupNode, I recommend we avoid 
adding a new API / parameter just for this situation and disable the RPC 
mechanism on the BackupNode entirely. I instead propose that we add a way for 
the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could 
take the form of an undocumented config parameter, or, my preference, add a 
static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the 
BackupNode can call.
If you agree that we can handle {{BackupImage}} as I described, I think I can 
remove this new parameter and limit the scope of the change.
# Agreed. I will fix this in the next patch.
# I thought more about why an operator might want to change this config. I 
determined that I can imagine situations when I would want to increase it, if 
the situation arises that RPC response time from the JournalNodes is high and 
the number of transactions per second is very high (say, a very high write 
workload). But I can't think of a reason to lower it; this is more about just 
setting a sanity-check upper bound. This makes me think we should (a) raise the 
default limit to 5000 -> even with a RTT RPC time of 100ms, which is quite 
high, this would allow 50k transactions per second, (b) make it undocumented as 
you described. I will incorporate this into the next patch.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-13609
>                 URL: https://issues.apache.org/jira/browse/HDFS-13609
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>         Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to