[
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689993#comment-16689993
]
Chen Liang edited comment on HDFS-14058 at 12/5/18 12:02 AM:
-------------------------------------------------------------
The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1
Observer NameNode. No other standby nodes. The cluster has light HDFS workload,
and has YARN deployed, has security (Kerberos) enabled. The purpose here was
not evaluate performance gain, but mainly to prove the functionality and
correctness.
In all the tests below, it is *verified from both name nodes audit log* that
the reads actually went to Observer node and writes went to Active, and it is
*verified from job/client logs* that when client could not talk to Observer
(e.g. for write requests, or Observer node is actually in Standby not
observer), it fell back to talking to the active.
The specific tests done include:
1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token
One observation on this is that, from command line, depending on the relative
order of ANN and ONN in config, the failover may happen every single time, with
an exception printed. This is because from command, every single command line
call will create a new DFSClient instance. Which may start with calling
Observer for write, causing failover. But for reused DFSClient (e.g. from a
Java program where it create and reuse same DFSClient), there is no this issue.
2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very
small input.
3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, with
default parameters. I ran Slive 3 times for both with Observer enabled and
disabled. I saw similar number of ops/sec.
4.DFSIO: ran DFSIO read test several times from
hadoop-mapreduce-client-jobclient jar, the tests were done with 100 files, 100
MB each.
5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate several times from
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and
500 reducers. All three jobs finished successfully.
was (Author: vagarychen):
The tests I've run include the following. Please note that the following tests
were done without several recent changes such as HDFS-14035 and HDFS-14017, but
with some hacky code change and workaround. Although the required changes have
been formalized to recent Jiras, the following tests haven't all been re-run
along with those change. Post here for record.
The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1
Observer NameNode. No other standby nodes. The cluster has light HDFS workload,
has YARN deployed, and has security (Kerberos) enabled. The purpose here was
not evaluate performance gain, but only to prove the functionality. In all the
tests below, it is verified from Observer node audit log that the reads
actually went to Observer node.
1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token
One observation on this is that, from command line, depending on the relative
order of ANN and ONN in config, the failover may happen every single time, with
an exception printed. I believe this is because from command, every single
command line call will create a new DFSClient instance. Which may start with
calling Observer for write, causing failover. But for reused DFSClient (e.g.
from a Java program where it create and reuse same DFSClient), there is no this
issue.
2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very
small input.
3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without
parameters (so it uses default). I ran Slive 3 times for both with Observer
enabled and disabled. I saw roughly the same ops/sec.
4.DFSIO: ran DFSIO read test several times from
hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10
files with 1KB each).
5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and
500 reducers. All three jobs finished successfully.
> Test reads from standby on a secure cluster with IP failover
> ------------------------------------------------------------
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: test
> Reporter: Konstantin Shvachko
> Assignee: Chen Liang
> Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA
> cluster with {{IPFailoverProxyProvider}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]