[
https://issues.apache.org/jira/browse/HELIX-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhen Zhang closed HELIX-551.
----------------------------
Resolution: Fixed
> External view & partition states go out of sync
> -----------------------------------------------
>
> Key: HELIX-551
> URL: https://issues.apache.org/jira/browse/HELIX-551
> Project: Apache Helix
> Issue Type: Bug
> Affects Versions: 0.6.4
> Reporter: Varun Sharma
>
> Hi,
> I am seeing the following issue for many partitions in helix using a simple
> Online->Offline state model factory. The external view says that the
> partition has been assigned to 3 hosts. However, when I look at the hosts
> only 1 of them executed the OFFLINE --> ONLINE transition.
> On the hosts, that did not execute the transition, I see the following:
> 2014-11-13 09:29:54,394 [pool-3-thread-11]
> (HelixStateTransitionHandler.java:206) WARN Force CurrentState on Zk to be
> stateModel's CurrentState. partitionKey: 490, currentState: ONLINE, message:
> 12690ce8-8098-46b1-a93d-279604f0e3db, {CREATE_TIMESTAMP=1415870993349,
> ClusterEventName=idealStateChange, EXECUTE_START_TIMESTAMP=1415870994382,
> EXE_SESSION_ID=149a14ada0d0013, FROM_STATE=OFFLINE,
> MSG_ID=12690ce8-8098-46b1-a93d-279604f0e3db, MSG_STATE=read,
> MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=490, READ_TIMESTAMP=1415870993787,
> RESOURCE_NAME=$terrapin$data$meta_pin_join$1415866960201,
> SRC_NAME=hdfsterrapin-a-namenode001_9090, SRC_SESSION_ID=147a7beb2dd8ed7,
> STATE_MODEL_DEF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT,
> TGT_NAME=hdfsterrapin-a-datanode-ba3ad256, TGT_SESSION_ID=149a14ada0d0013,
> TO_STATE=ONLINE}{}{}
> When I grep the message ID in the controller, I see the following:
> 2014-11-14 09:34:56,265 [StatusDumpTimerTask] (ZKPathDataDumpTask.java:155)
> INFO {
> "id" : "149a14ada0d0013__$terrapin$data$meta_pin_join$1415866960201",
> "mapFields" : {
> "HELIX_ERROR 20141113-092954.000419 STATE_TRANSITION
> c1193025-b416-49d7-adc2-10afe2389141" : {
> "AdditionalInfo" : "Message execution failed. msgId:
> 12690ce8-8098-46b1-a93d-279604f0e3db, errorMsg:
> org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException:
> Current state of stateModel does not match the fromState in Message, Current
> State:ONLINE, message expected:OFFLINE, partition: 490, from:
> hdfsterrapin-a-namenode001_9090, to: hdfsterrapin-a-datanode-ba3ad256",
> "Class" : "class
> org.apache.helix.messaging.handling.HelixStateTransitionHandler",
> "MSG_ID" : "12690ce8-8098-46b1-a93d-279604f0e3db",
> "Message state" : "READ"
> },
> What could be causing this - when I restart the node, the error disappears
> (meaning that the node is able to perform the state transition). What could
> be causing this state mismatch ?
> Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)