[
https://issues.apache.org/jira/browse/HBASE-13584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518287#comment-14518287
]
Devaraj Das commented on HBASE-13584:
-------------------------------------
I have had two instances of issues so far.
In the first instance, META recovery was involved. AFAICT, for some reason the
recovery process ended up not recovering meta correctly. At some point the
master restarted, and while reading current region assignments, the master saw
the old data in the meta (assignments that were recorded prior to the last
recovery), leading to issues in assignments..
In the second instance, the log replay operation seemed to never complete due
to ZK connectivity issue:
{noformat}
2015-04-28 06:00:03,060 INFO
[RS_LOG_REPLAY_OPS-os-h2-amb-r6-1430116828-sec-phoenix-5:16020-0-Writer-0-SendThread(os-h2-
amb-r6-1430116828-sec-phoenix-2.novalocal:2181)] zookeeper.ClientCnxn: Socket
connection established to os-h2-amb-r6-1430116828-sec-
phoenix-2.novalocal/192.168.77.213:2181, initiating session
2015-04-28 06:00:03,061 WARN
[RS_LOG_REPLAY_OPS-os-h2-amb-r6-1430116828-sec-phoenix-5:16020-1-Writer-1-SendThread(os-h2-
amb-r6-1430116828-sec-phoenix-2.novalocal:2181)] zookeeper.ClientCnxn: Session
0x0 for server os-h2-amb-r6-1430116828-sec-phoenix-
2.novalocal/192.168.77.213:2181, unexpected error, closing socket connection
and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
{noformat}
The above was being spewed constantly.. But the HBase/ZK cluster as such was
functional. Not sure what caused this in DLR.
I have the logs for both the above. Want to take a look [~stack]? I'll be happy
to be proven wrong on the above.
> Disable distributed log replay by default for 1.1
> -------------------------------------------------
>
> Key: HBASE-13584
> URL: https://issues.apache.org/jira/browse/HBASE-13584
> Project: HBase
> Issue Type: Task
> Components: master, MTTR
> Affects Versions: 1.1.0
> Reporter: Nick Dimiduk
> Assignee: Nick Dimiduk
> Priority: Critical
> Fix For: 1.1.0
>
> Attachments: HBASE-13584.00.branch-1.1.patch
>
>
> HBASE-12743 hasn't a clear owner and our [~devaraj] has been seeing issues
> around this in internal runs as well. It also seems to be incompatible with
> rolling upgrade from 0.98, so folks making the upgrade to 1.1 will need to
> disable it out the gate anyway.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)