[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916216#comment-16916216
 ] 

Eric Yang edited comment on HDFS-2470 at 8/26/19 11:15 PM:
-----------------------------------------------------------

[~swagle] Thank you for patch 09, unfortunately, this patch breaks HBase for 
some reason.  HBase does not show exact error, but fail to start HBase Region 
server.  It appears that there is an exception thrown, but the error menifested 
in HBase as ZooKeeper ACL exception:

{code}
2019-08-26 14:45:42,597 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 client.ZooKeeperSaslClient: Could not login: the client is being asked for a 
password, but the Zookeeper client code does not currently support obtaining a 
password from the user. Make sure that the client is configured to use a ticket 
cache (using the JAAS configuration setting 'useTicketCache=true)' and restart 
the client. If you still get this message after that, the TGT in the ticket 
cache has expired and must be manually refreshed. To do so, first determine if 
you are using a password or a keytab. If the former, run kinit in a Unix shell 
in the environment of the user who is running this Zookeeper client using the 
command 'kinit <princ>' (where <princ> is the name of the client's Kerberos 
principal). If the latter, do 'kinit -k -t <keytab> <princ>' (where <princ> is 
the name of the Kerberos principal, and <keytab> is the location of the keytab 
file). After manually refreshing your cache, restart this client. If you 
continue to see this message after manually refreshing your cache, ensure that 
your KDC host's clock is in sync with this host's clock.
2019-08-26 14:45:42,598 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: SASL configuration failed: 
javax.security.auth.login.LoginException: No password provided Will continue 
connection to Zookeeper server without SASL authentication, if Zookeeper server 
allows it.
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Opening socket connection to server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Socket connection established to 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, initiating session
2019-08-26 14:45:42,601 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Session establishment complete on server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, sessionid = 0x200010a127c0070, 
negotiated timeout = 60000
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] ipc.RpcServer: 
Stopping server on 16020
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
token.AuthenticationTokenSecretManager: Stopping leader election, because: 
SecretManager stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.listener,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopped
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopping
2019-08-26 14:45:45,660 FATAL 
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
regionserver.HRegionServer: ABORTING region server 
eyang-3.vpc.cloudera.com,16020,1566855941147: Initialization of RS failed.  
Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:819)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:772)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:744)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:889)
        at java.lang.Thread.run(Thread.java:748)
{code}

When the patch is removed, HBase was not able to start successfully.  I dig 
pretty deep in HBase source code, but StorageDirectory is not used in the code 
base.  I am validated that the Datanode directory default permission doesn't 
change by patch 09.  More studies is required to understand the root cause of 
the incompatibility.


was (Author: eyang):
[~swagle] Thank you for patch 09, unfortunately, this patch breaks HBase for 
some reason.  HBase does not show exact error, but fail to start HBase Region 
server.  It appears that there is an exception thrown, but the error menifested 
in HBase as ZooKeeper ACL exception:

{code}
2019-08-26 14:45:42,597 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 client.ZooKeeperSaslClient: Could not login: the client is being asked for a 
password, but the Zookeeper client code does not currently support obtaining a 
password from the user. Make sure that the client is configured to use a ticket 
cache (using the JAAS configuration setting 'useTicketCache=true)' and restart 
the client. If you still get this message after that, the TGT in the ticket 
cache has expired and must be manually refreshed. To do so, first determine if 
you are using a password or a keytab. If the former, run kinit in a Unix shell 
in the environment of the user who is running this Zookeeper client using the 
command 'kinit <princ>' (where <princ> is the name of the client's Kerberos 
principal). If the latter, do 'kinit -k -t <keytab> <princ>' (where <princ> is 
the name of the Kerberos principal, and <keytab> is the location of the keytab 
file). After manually refreshing your cache, restart this client. If you 
continue to see this message after manually refreshing your cache, ensure that 
your KDC host's clock is in sync with this host's clock.
2019-08-26 14:45:42,598 WARN  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: SASL configuration failed: 
javax.security.auth.login.LoginException: No password provided Will continue 
connection to Zookeeper server without SASL authentication, if Zookeeper server 
allows it.
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Opening socket connection to server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181
2019-08-26 14:45:42,598 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Socket connection established to 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, initiating session
2019-08-26 14:45:42,601 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020-SendThread(eyang-4.vpc.cloudera.com:2181)]
 zookeeper.ClientCnxn: Session establishment complete on server 
eyang-4.vpc.cloudera.com/10.65.53.170:2181, sessionid = 0x200010a127c0070, 
negotiated timeout = 60000
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] ipc.RpcServer: 
Stopping server on 16020
2019-08-26 14:45:45,659 INFO  
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
token.AuthenticationTokenSecretManager: Stopping leader election, because: 
SecretManager stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.listener,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: stopping
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopped
2019-08-26 14:45:45,660 INFO  [RpcServer.responder] ipc.RpcServer: 
RpcServer.responder: stopping
2019-08-26 14:45:45,660 FATAL 
[regionserver/eyang-3.vpc.cloudera.com/10.65.52.68:16020] 
regionserver.HRegionServer: ABORTING region server 
eyang-3.vpc.cloudera.com,16020,1566855941147: Initialization of RS failed.  
Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:819)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:772)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:744)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:889)
        at java.lang.Thread.run(Thread.java:748)
{code}

When the patch is removed, HBase was able to start successfully.  I dig pretty 
deep in HBase source code, but StorageDirectory is not used in the code base.  
I am validated that the Datanode directory default permission doesn't change by 
patch 09.  More studies is required to understand the root cause of the 
incompatibility.

> NN should automatically set permissions on dfs.namenode.*.dir
> -------------------------------------------------------------
>
>                 Key: HDFS-2470
>                 URL: https://issues.apache.org/jira/browse/HDFS-2470
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Siddharth Wagle
>            Priority: Major
>             Fix For: 3.3.0, 3.2.1
>
>         Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, HDFS-2470.09.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to