[ 
https://issues.apache.org/jira/browse/STORM-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002390#comment-14002390
 ] 

Robert Joseph Evans commented on STORM-307:
-------------------------------------------

For some strange reason all of the local state files are empty, but not the 
isupervisor ones.  This either means that Utils.serialize returned an empty 
array, or that FileUtils.writeByteArrayToFile did not fully write out what it 
was supposed to.  Either way this seems very odd. 

> After host crash, supervisor is unable to restart itself
> --------------------------------------------------------
>
>                 Key: STORM-307
>                 URL: https://issues.apache.org/jira/browse/STORM-307
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.1-incubating
>         Environment: Debian Linux Wheezy
> Zookeeper 3.3.3
> Java 1.7.0_25
>            Reporter: Damien Raude-Morvan
>         Attachments: supeof.tar.bz2
>
>
> Hi,
> I've observed [multiple times|#links] that supervisor state de-serialisation 
> after host crash or reboot can fail. Supervisor is then unable to come up 
> without manual intervention. AFAICT, it seems that serialized supervisor 
> state if invalid and coun't be read at next start.
> Observed error in supervisor log :
> {noformat}
> 2014-04-29 19:38:35 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2014-04-29 19:38:35 o.a.z.ZooKeeper [INFO] Initiating client connection, 
> connectString=127.0.0.1:2181/storm sessionTimeout=20000 
> watcher=com.netflix.curator.ConnectionState@18d055e0
> 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Opening socket connection to 
> server /127.0.0.1:2181
> 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Socket connection established to 
> localhost/127.0.0.1:2181, initiating session
> 2014-04-29 19:38:35 o.a.z.ClientCnxn [INFO] Session establishment complete on 
> server localhost/127.0.0.1:2181, sessionid = 0x145a7cc1c7e48b1, negotiated 
> timeout = 20000
> 2014-04-29 19:38:35 b.s.d.supervisor [INFO] Starting supervisor with id 
> 71b01216-9d00-4fb6-8538-6673058ab5ef at host storm
> 2014-04-29 19:38:36 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.EOFException
>         at backtype.storm.utils.Utils.deserialize(Utils.java:86) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at backtype.storm.utils.LocalState.snapshot(LocalState.java:45) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at backtype.storm.utils.LocalState.get(LocalState.java:56) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at 
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:207) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         at clojure.lang.AFn.applyToHelper(AFn.java:161) 
> ~[clojure-1.4.0.jar:na]
>         at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
>         at clojure.core$apply.invoke(core.clj:603) ~[clojure-1.4.0.jar:na]
>         at clojure.core$partial$fn__4070.doInvoke(core.clj:2343) 
> ~[clojure-1.4.0.jar:na]
>         at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.4.0.jar:na]
>         at backtype.storm.event$event_manager$fn__2593.invoke(event.clj:39) 
> ~[na:na]
>         at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
>         at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
> Caused by: java.io.EOFException: null
>         at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2323)
>  ~[na:1.7.0_25]
>         at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2792)
>  ~[na:1.7.0_25]
>         at 
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:799) 
> ~[na:1.7.0_25]
>         at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) 
> ~[na:1.7.0_25]
>         at backtype.storm.utils.Utils.deserialize(Utils.java:81) 
> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>         ... 11 common frames omitted
> 2014-04-29 19:38:36 b.s.util [INFO] Halting process: ("Error when processing 
> an event")
> {noformat}
> Current workaround : full stop supervisor daemon and delete all Storm's 
> data/supervisor directory helped, and after restarting Supervisor is now 
> running smoothly. 
> {anchor:links} Here is some references of very similar issues :
> * 
> http://mail-archives.apache.org/mod_mbox/storm-user/201402.mbox/%3c23100d14e7ac4cef947f7236ef896...@by2pr08mb144.namprd08.prod.outlook.com%3E
> * https://groups.google.com/forum/#!topic/storm-user/SL9FK9XeoI8
> * https://groups.google.com/forum/#!topic/storm-user/2gapTYTRrX8
> Regards,



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to