[ 
https://issues.apache.org/jira/browse/STORM-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated STORM-790:
-------------------------------
    Description: 
In STORM-770, some users have observed that worker suddenly died with NPE in 
consumeBatchToCursor().

Looks like it can occur when "task" in "mk-transfer-fn" is null.

It was also an issue equal or before 0.9.2-incubating and it throws NPE, too.
Lower than 0.9.2 version, you can see NPE from KryoTupleSerializer.serialize.
And at 0.9.2 and higher version, you can see NPE from clojure.lang.RT.intCast.

Before finding root cause of this issue, it would be better to let worker not 
killed by this issue but just log with WARN or ERROR level.
It really makes sense cause with Guaranteeing Message Processing, after 
timed-out tuple will be replayed. (It isn't applied to non-ack)

  was:
In STORM-770, some users have observed that worker suddenly died with NPE in 
consumeBatchToCursor().

Looks like it can occur when "task" in "mk-transfer-fn" is null.
It may not be an issue before 0.9.2-incubating, since worker just ignores that 
tuple. 
Please see 
https://issues.apache.org/jira/browse/STORM-770?focusedCommentId=14496199&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14496199.

Before finding root cause of this issue, it would be better to let worker not 
killed by this issue but just log with WARN or ERROR level.
It really makes sense cause before 0.9.2 Storm silently ignores tuple, and with 
Guaranteeing Message Processing, after timed-out tuple will be replayed. (It 
isn't applied to non-ack)


> Log "task id is null" instead of let worker died (NPE in consumeBatchToCursor)
> ------------------------------------------------------------------------------
>
>                 Key: STORM-790
>                 URL: https://issues.apache.org/jira/browse/STORM-790
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating, 0.9.3, 0.10.0, 0.9.4, 0.11.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>             Fix For: 0.10.0, 0.9.5
>
>
> In STORM-770, some users have observed that worker suddenly died with NPE in 
> consumeBatchToCursor().
> Looks like it can occur when "task" in "mk-transfer-fn" is null.
> It was also an issue equal or before 0.9.2-incubating and it throws NPE, too.
> Lower than 0.9.2 version, you can see NPE from KryoTupleSerializer.serialize.
> And at 0.9.2 and higher version, you can see NPE from clojure.lang.RT.intCast.
> Before finding root cause of this issue, it would be better to let worker not 
> killed by this issue but just log with WARN or ERROR level.
> It really makes sense cause with Guaranteeing Message Processing, after 
> timed-out tuple will be replayed. (It isn't applied to non-ack)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to