[jira] [Comment Edited] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

Chen Liang (JIRA) Tue, 04 Dec 2018 16:03:20 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689993#comment-16689993
 ]


Chen Liang edited comment on HDFS-14058 at 12/5/18 12:02 AM:
-------------------------------------------------------------

The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 
Observer NameNode. No other standby nodes. The cluster has light HDFS workload, 
and has YARN deployed, has security (Kerberos) enabled. The purpose here was 
not evaluate performance gain, but mainly to prove the functionality and 
correctness.

In all the tests below, it is *verified from both name nodes audit log* that 
the reads actually went to Observer node and writes went to Active, and it is 
*verified from job/client logs* that when client could not talk to Observer 
(e.g. for write requests, or Observer node is actually in Standby not 
observer), it fell back to talking to the active.

The specific tests done include:
1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient 
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token

One observation on this is that, from command line, depending on the relative 
order of ANN and ONN in config, the failover may happen every single time, with 
an exception printed. This is because from command, every single command line 
call will create a new DFSClient instance. Which may start with calling 
Observer for write, causing failover. But for reused DFSClient (e.g. from a 
Java program where it create and reuse same DFSClient), there is no this issue.

2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very 
small input.

3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, with 
default parameters. I ran Slive 3 times for both with Observer enabled and 
disabled. I saw similar number of ops/sec.

4.DFSIO: ran DFSIO read test several times from 
hadoop-mapreduce-client-jobclient jar, the tests were done with 100 files, 100 
MB each. 

5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate several times from 
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 
500 reducers. All three jobs finished successfully.


was (Author: vagarychen):
The tests I've run include the following. Please note that the following tests 
were done without several recent changes such as HDFS-14035 and HDFS-14017, but 
with some hacky code change and workaround. Although the required changes have 
been formalized to recent Jiras, the following tests haven't all been re-run 
along with those change. Post here for record.

The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 
Observer NameNode. No other standby nodes. The cluster has light HDFS workload, 
has YARN deployed, and has security (Kerberos) enabled. The purpose here was 
not evaluate performance gain, but only to prove the functionality. In all the 
tests below, it is verified from Observer node audit log that the reads 
actually went to Observer node.

1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient 
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token

One observation on this is that, from command line, depending on the relative 
order of ANN and ONN in config, the failover may happen every single time, with 
an exception printed. I believe this is because from command, every single 
command line call will create a new DFSClient instance. Which may start with 
calling Observer for write, causing failover. But for reused DFSClient (e.g. 
from a Java program where it create and reuse same DFSClient), there is no this 
issue.

2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very 
small input.

3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without 
parameters (so it uses default). I ran Slive 3 times for both with Observer 
enabled and disabled. I saw roughly the same ops/sec.

4.DFSIO: ran DFSIO read test several times from 
hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 
files with 1KB each). 

5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from 
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 
500 reducers. All three jobs finished successfully.

> Test reads from standby on a secure cluster with IP failover
> ------------------------------------------------------------
>
>                 Key: HDFS-14058
>                 URL: https://issues.apache.org/jira/browse/HDFS-14058
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Konstantin Shvachko
>            Assignee: Chen Liang
>            Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

Reply via email to