xkrogen commented on code in PR #4871:
URL: https://github.com/apache/hadoop/pull/4871#discussion_r968666305
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNode.java:
##########
@@ -124,6 +130,42 @@ public static void shutDownCluster() throws IOException {
}
}
+ @Test
+ public void testObserverRequeue() throws Exception {
+ ScheduledExecutorService interruptor =
+ Executors.newScheduledThreadPool(1);
+
+ EditLogTailer observerEditlogTailer = dfsCluster.getNameNode(2)
+ .getNamesystem().getEditLogTailer();
+ RpcMetrics obRpcMetrics = ((NameNodeRpcServer)dfsCluster
+ .getNameNodeRpc(2)).getClientRpcServer().getRpcMetrics();
+
+ // Stop EditlogTailer of Observer NameNode.
+ observerEditlogTailer.stop();
+
+ long oldRequeueNum = obRpcMetrics.getRpcRequeueCalls();
+
+ ScheduledFuture<FileStatus> scheduledFuture = interruptor.schedule(
+ () -> {
+ Path tmpTestPath = new Path("/TestObserverRequeue");
+ dfs.create(tmpTestPath, (short)1).close();
+ assertSentTo(0);
+ // This operation will be blocked in ObserverNameNode
+ // until EditlogTailer tailed edits from journalNode.
+ FileStatus fileStatus = dfs.getFileStatus(tmpTestPath);
+ assertSentTo(2);
+ return fileStatus;
+ }, 0, TimeUnit.SECONDS);
+ Thread.sleep(1000);
+ observerEditlogTailer.doTailEdits();
+ FileStatus fileStatus = scheduledFuture.get(1000, TimeUnit.MILLISECONDS);
Review Comment:
I would recommend a longer value like 10s -- 1s can easily get exceeded by
some GC or other transient slowness event
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##########
@@ -6997,6 +6997,16 @@ public synchronized void
verifyToken(DelegationTokenIdentifier identifier,
public EditLogTailer getEditLogTailer() {
return editLogTailer;
}
+
+ @VisibleForTesting
+ public void startNewEditLogTailer(Configuration conf) throws IOException {
+ if (this.editLogTailer != null) {
+ this.editLogTailer.stop();
+ }
+
+ this.editLogTailer = new EditLogTailer(this, conf);
+ this.editLogTailer.start();
+ }
Review Comment:
Can we just use existing methods `setEditLogTailerForTests()` and
`getEditLogTailer()` instead of adding this new method?
##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNode.java:
##########
@@ -124,6 +130,42 @@ public static void shutDownCluster() throws IOException {
}
}
+ @Test
+ public void testObserverRequeue() throws Exception {
+ ScheduledExecutorService interruptor =
+ Executors.newScheduledThreadPool(1);
+
+ EditLogTailer observerEditlogTailer = dfsCluster.getNameNode(2)
+ .getNamesystem().getEditLogTailer();
+ RpcMetrics obRpcMetrics = ((NameNodeRpcServer)dfsCluster
+ .getNameNodeRpc(2)).getClientRpcServer().getRpcMetrics();
+
+ // Stop EditlogTailer of Observer NameNode.
+ observerEditlogTailer.stop();
+
+ long oldRequeueNum = obRpcMetrics.getRpcRequeueCalls();
+
+ ScheduledFuture<FileStatus> scheduledFuture = interruptor.schedule(
+ () -> {
+ Path tmpTestPath = new Path("/TestObserverRequeue");
+ dfs.create(tmpTestPath, (short)1).close();
+ assertSentTo(0);
+ // This operation will be blocked in ObserverNameNode
+ // until EditlogTailer tailed edits from journalNode.
+ FileStatus fileStatus = dfs.getFileStatus(tmpTestPath);
+ assertSentTo(2);
+ return fileStatus;
+ }, 0, TimeUnit.SECONDS);
+ Thread.sleep(1000);
Review Comment:
I guess you are trying to give the call time to be requeued?
Instead of having a static `sleep()`, which can slow the test, I think using
`GenericTestUtils.waitFor()` would be better. Something like:
```
GenericTestUtils.waitFor(() -> obRpcMetrics.getRpcRequeueCalls() >
oldRequeueNum, 50, 10000);
FileStatus fileStatus = ...
```
With a static sleep value, you have a tradeoff between a shorter sleep which
can cause test flakiness (e.g. failure due to some temporary slowness) vs. long
sleep which causes longer test runtime. A `waitFor` allows you to trade off by
terminating as soon as the condition is met, but allowing for a longer maximum
wait time to reduce flakiness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]