[ 
https://issues.apache.org/jira/browse/HBASE-13584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518287#comment-14518287
 ] 

Devaraj Das commented on HBASE-13584:
-------------------------------------

I have had two instances of issues so far. 
In the first instance, META recovery was involved. AFAICT, for some reason the 
recovery process ended up not recovering meta correctly. At some point the 
master restarted, and while reading current region assignments, the master saw 
the old data in the meta (assignments that were recorded prior to the last 
recovery), leading to issues in assignments..

In the second instance, the log replay operation seemed to never complete due 
to ZK connectivity issue:
{noformat}
2015-04-28 06:00:03,060 INFO  
[RS_LOG_REPLAY_OPS-os-h2-amb-r6-1430116828-sec-phoenix-5:16020-0-Writer-0-SendThread(os-h2-
amb-r6-1430116828-sec-phoenix-2.novalocal:2181)] zookeeper.ClientCnxn: Socket 
connection established to os-h2-amb-r6-1430116828-sec-
phoenix-2.novalocal/192.168.77.213:2181, initiating session
2015-04-28 06:00:03,061 WARN  
[RS_LOG_REPLAY_OPS-os-h2-amb-r6-1430116828-sec-phoenix-5:16020-1-Writer-1-SendThread(os-h2-
amb-r6-1430116828-sec-phoenix-2.novalocal:2181)] zookeeper.ClientCnxn: Session 
0x0 for server os-h2-amb-r6-1430116828-sec-phoenix-
2.novalocal/192.168.77.213:2181, unexpected error, closing socket connection 
and attempting reconnect
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
{noformat}
The above was being spewed constantly.. But the HBase/ZK cluster as such was 
functional. Not sure what caused this in DLR.

I have the logs for both the above. Want to take a look [~stack]? I'll be happy 
to be proven wrong on the above.

> Disable distributed log replay by default for 1.1
> -------------------------------------------------
>
>                 Key: HBASE-13584
>                 URL: https://issues.apache.org/jira/browse/HBASE-13584
>             Project: HBase
>          Issue Type: Task
>          Components: master, MTTR
>    Affects Versions: 1.1.0
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>            Priority: Critical
>             Fix For: 1.1.0
>
>         Attachments: HBASE-13584.00.branch-1.1.patch
>
>
> HBASE-12743 hasn't a clear owner and our [~devaraj] has been seeing issues 
> around this in internal runs as well. It also seems to be incompatible with 
> rolling upgrade from 0.98, so folks making the upgrade to 1.1 will need to 
> disable it out the gate anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to