[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521703#comment-14521703
 ] 

Chris Nauroth commented on ZOOKEEPER-2175:
------------------------------------------

Hi [~brahmareddy].  I agree with Rakesh that this is excellent debugging work!  
Thank you for posting the details.

[~rakeshr], what would you think of a feature request to support a checksum 
validation when wire encryption is not used?  I'd be happy to help.  Our use 
case is HDFS NameNode HA, which relies on 2 NameNode processes coordinating on 
a distributed lock in ZooKeeper.  The data is not private, so there might not 
be a strong motivation for an HDFS deployment to enable ZooKeeper wire 
encryption.

[~brahmareddy], the HDFS logic in this area is driven by ZooKeeper status code 
checks like the following in {{ActiveStandbyElector}}:

{code}
  private static boolean isSuccess(Code code) {
    return (code == Code.OK);
  }
{code}

I'm wondering if there is something that we can do in the HDFS code to check 
for a specific ZooKeeper status code, and then reconnect our session and retry 
taking the lock instead of transitioning to standby.  Do you know if there was 
a particular ZooKeeper status code that you saw when this happened?  Do you 
have capability to repro consistently?

I'll comment on HDFS-8161 too.

> Checksum validation for malformed packets needs to handle.
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2175
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2175
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>
>  *Session Id from ZK :* 
> 2015-04-15 21:24:54,257 | INFO  | CommitProcessor:22 | Established session 
> 0x164cb2b3e4b36ae4 with negotiated timeout 45000 for client 
> /160.149.0.117:44586 | 
> org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:623)
> 2015-04-15 21:24:54,261 | INFO  | 
> NIOServerCxn.Factory:160-149-0-114/160.149.0.114:24002 | Successfully 
> authenticated client: authenticationID=hdfs/[email protected];  
> authorizationID=hdfs/[email protected]. | 
> org.apache.zookeeper.server.auth.SaslServerCallbackHandler.handleAuthorizeCallback(SaslServerCallbackHandler.java:118)
> 2015-04-15 21:24:54,261 | INFO  | 
> NIOServerCxn.Factory:160-149-0-114/160.149.0.114:24002 | Setting 
> authorizedID: hdfs/[email protected] | 
> org.apache.zookeeper.server.auth.SaslServerCallbackHandler.handleAuthorizeCallback(SaslServerCallbackHandler.java:134)
> 2015-04-15 21:24:54,261 | INFO  | 
> NIOServerCxn.Factory:160-149-0-114/160.149.0.114:24002 | adding SASL 
> authorization for authorizationID: hdfs/[email protected] | 
> org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1009)
> 2015-04-15 21:24:54,262 | INFO  | ProcessThread(sid:22 cport:-1): | Got 
> user-level KeeperException when processing  
> *{color:red}sessionid:0x164cb2b3e4b36ae4{color}*  type:create cxid:0x3 
> zxid:0x20009fafc txntype:-1 reqpath:n/a Error 
> Path:/hadoop-ha/hacluster/ActiveStandbyElectorLock Error:KeeperErrorCode = 
> NodeExists for /hadoop-ha/hacluster/ActiveStandbyElectorLock | 
> org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:648)
>  *ZKFC Received :*  ZK client
> 2015-04-15 21:24:54,237 | INFO  | main-SendThread(160-149-0-114:24002) | 
> Socket connection established to 160-149-0-114/160.149.0.114:24002, 
> initiating session | 
> org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:854)
> 2015-04-15 21:24:54,257 | INFO  | main-SendThread(160-149-0-114:24002) | 
> Session establishment complete on server 160-149-0-114/160.149.0.114:24002,  
> *{color:blue}sessionid = 0x144cb2b3e4b36ae4 {color}* , negotiated timeout = 
> 45000 | 
> org.apache.zookeeper.ClientCnxn$SendThread.onConnected(ClientCnxn.java:1259)
> 2015-04-15 21:24:54,260 | INFO  | main-EventThread | EventThread shut down | 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:512)
> 2015-04-15 21:24:54,262 | INFO  | main-EventThread | Session connected. | 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:547)
> 2015-04-15 21:24:54,264 | INFO  | main-EventThread | Successfully 
> authenticated to ZooKeeper using SASL. | 
> org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:573)
> one bit corrupted..please check the following for same..
> 144cb2b3e4b36ae4=1010001001100101100101011001111100100101100110110101011100100
> 164cb2b3e4b36ae4=1011001001100101100101011001111100100101100110110101011100100



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to