[
https://issues.apache.org/jira/browse/FLINK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501026#comment-16501026
]
yan zhou commented on FLINK-9524:
---------------------------------
Here is the trace log I added into ProcTimeBoundedRangeOver.scala. It should
explain how does NPE happen:
_[ts:1528149296456] [label:state_ttl_update] register for cleanup at
1528150096456(CLEANUP_TIME_1), because of Row:(orderId:001,userId:U123)_
_[ts:1528149296456] [label:register_pt] register for process input at
1528149296457, because of Row:(orderId:001,userId:U123)_
_[ts:1528149296458] [label:state_apply] ontimer at 1528149296457, apply
Row:(orderId:001,userId:U123) to accumulator_
_[ts:1528149885813] [label:state_ttl_update] register at
1528150685813(__CLEANUP_TIME___2__), because of Row:(orderId:002,userId:U123)_
_[ts:1528149885813] [label:register_pt] register for process input at
1528149885814, because of Row:(orderId:002,userId:U123)_
_[ts:1528149885814] [label:state_apply] ontimer at 1528149885814, apply
Row:(orderId:002,userId:U123) to accumulator_
_[ts:1528150096460] [label:NO_ELEMENTS_IN_STATE] ontimer at
1528150096456(__CLEANUP_TIME___1__), bypass needToCleanupState check, however
rowMapState is \{key:1528150096455, value:[]}_
_[ts:1528150685815] [label:state_timeout] ontimer at
1528150685813(__CLEANUP_TIME___2__), clean/empty the rowMapState
[\{key:1528149885813, value:[Row:(orderId:002,userId:U123)]}]_
> NPE from ProcTimeBoundedRangeOver.scala
> ---------------------------------------
>
> Key: FLINK-9524
> URL: https://issues.apache.org/jira/browse/FLINK-9524
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.5.0
> Reporter: yan zhou
> Priority: Major
> Attachments: npe_from_ProcTimeBoundedRangeOver.txt
>
>
> The class _ProcTimeBoundedRangeOver_ would throws NPE if _minRetentionTime_
> and _maxRetentionTime_ are set to greater then 1.
> Please see [^npe_from_ProcTimeBoundedRangeOver.txt] for the detail of
> exception. Below is a short description of the cause:
> * When the first event for a key arrives, the cleanup time is registered
> with _timerservice_ and recorded in _cleanupTimeState_. If the second event
> with same key arrives before the cleanup time, the value in
> _cleanupTimeState_ is updated and a new timer is registered to
> _timerService_. So now we have two registered timers for cleanup. One is
> registered because of the first event, the other for the second event.
> * However, when _onTimer_ method is fired for the first cleanup timer, the
> _cleanupTimeStates_ value has already been updated to second cleanup time. So
> it will bypass the _needToCleanupState_ check, and yet run through the
> remained code of _onTimer_ (which is intended to update the accumulator and
> emit output) and cause NPE.
> _RowTimeBoundedRangeOver_ has very similar logic with
> _ProcTimeBoundedRangeOver. But_ It won't cause NPE by the same reason. To
> avoid the exception, it simply add a null check before running the logic for
> updating accumulator.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)