[
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916216#comment-16916216
]
Eric Yang commented on HDFS-2470:
---------------------------------
[~swagle] Thank you for patch 09, unfortunately, this patch breaks HBase for
some reason. HBase does not show exact error, but fail to start HBase Region
server. It appears that there is an exception thrown, but the error menifested
in HBase as ZooKeeper ACL exception:
{code}
2019-08-26 14:45:42,597 WARN
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
client.ZooKeeperSaslClient: Could not login: the client is being asked for a
password, but the Zookeeper client code does not currently support obtaining a
password from the user. Make sure that the client is configured to use a ticket
cache (using the JAAS configuration setting 'useTicketCache=true)' and restart
the client. If you still get this message after that, the TGT in the ticket
cache has expired and must be manually refreshed. To do so, first determine if
you are using a password or a keytab. If the former, run kinit in a Unix shell
in the environment of the user who is running this Zookeeper client using the
command 'kinit <princ>' (where <princ> is the name of the client's Kerberos
principal). If the latter, do 'kinit -k -t <keytab> <princ>' (where <princ> is
the name of the Kerberos principal, and <keytab> is the location of the keytab
file). After manually refreshing your cache, restart this client. If you
continue to see this message after manually refreshing your cache, ensure that
your KDC host's clock is in sync with this host's clock.
2019-08-26 14:45:42,598 WARN
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
zookeeper.ClientCnxn: SASL configuration failed:
javax.security.auth.login.LoginException: No password provided Will continue
connection to Zookeeper server without SASL authentication, if Zookeeper server
allows it.
2019-08-26 14:45:42,598 INFO
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
eyang-4.vpc.cloudera.com/10.65.53.170:2181
2019-08-26 14:45:42,598 INFO
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
zookeeper.ClientCnxn: Socket connection established to
eyang-4.vpc.cloudera.com/10.65.53.170:2181, initiating session
2019-08-26 14:45:42,601 INFO
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
zookeeper.ClientCnxn: Session establishment complete on server
eyang-4.vpc.cloudera.com/10.65.53.170:2181, sessionid = 0x200010a127c0070,
negotiated timeout = 60000
2019-08-26 14:45:45,659 INFO
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] ipc.RpcServer:
Stopping server on 16020
2019-08-26 14:45:45,659 INFO
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020]
token.AuthenticationTokenSecretManager: Stopping leader election, because:
SecretManager stopping
2019-08-26 14:45:45,660 INFO [RpcServer.listener,port=16020] ipc.RpcServer:
RpcServer.listener,port=16020: stopping
2019-08-26 14:45:45,660 INFO [RpcServer.responder] ipc.RpcServer:
RpcServer.responder: stopped
2019-08-26 14:45:45,660 INFO [RpcServer.responder] ipc.RpcServer:
RpcServer.responder: stopping
2019-08-26 14:45:45,660 FATAL
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020]
regionserver.HRegionServer: ABORTING region server
eyang-3.vpc.cloudera.com,16020,1566855941147: Initialization of RS failed.
Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:819)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:772)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:744)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:889)
at java.lang.Thread.run(Thread.java:748)
{code}
When the patch is removed, HBase was able to start successfully. I dig pretty
deep in HBase source code, but StorageDirectory is not used in the code base.
I am validated that the Datanode directory default permission doesn't change by
patch 09. More studies is required to understand the root cause of the
incompatibility.
> NN should automatically set permissions on dfs.namenode.*.dir
> -------------------------------------------------------------
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.0.0-alpha
> Reporter: Aaron T. Myers
> Assignee: Siddharth Wagle
> Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch,
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch,
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, HDFS-2470.09.patch
>
>
> Much as the DN currently sets the correct permissions for the
> dfs.datanode.data.dir, the NN should do the same for the
> dfs.namenode.(name|edit).dir.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]