[ 
https://issues.apache.org/jira/browse/FLINK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501026#comment-16501026
 ] 

yan zhou commented on FLINK-9524:
---------------------------------

Here is the trace log I added into ProcTimeBoundedRangeOver.scala. It should 
explain how does NPE happen:

 
_[ts:1528149296456] [label:state_ttl_update] register for cleanup at 
1528150096456(CLEANUP_TIME_1), because of Row:(orderId:001,userId:U123)_
_[ts:1528149296456] [label:register_pt] register for process input at 
1528149296457, because of Row:(orderId:001,userId:U123)_
_[ts:1528149296458] [label:state_apply] ontimer at 1528149296457, apply 
Row:(orderId:001,userId:U123) to accumulator_
 
_[ts:1528149885813] [label:state_ttl_update] register at 
1528150685813(__CLEANUP_TIME___2__), because of Row:(orderId:002,userId:U123)_
_[ts:1528149885813] [label:register_pt] register for process input at 
1528149885814, because of Row:(orderId:002,userId:U123)_
_[ts:1528149885814] [label:state_apply] ontimer at 1528149885814, apply 
Row:(orderId:002,userId:U123) to accumulator_
 
_[ts:1528150096460] [label:NO_ELEMENTS_IN_STATE] ontimer at 
1528150096456(__CLEANUP_TIME___1__), bypass needToCleanupState check, however 
rowMapState is \{key:1528150096455, value:[]}_
 
_[ts:1528150685815] [label:state_timeout] ontimer at 
1528150685813(__CLEANUP_TIME___2__), clean/empty the rowMapState 
[\{key:1528149885813, value:[Row:(orderId:002,userId:U123)]}]_

> NPE from ProcTimeBoundedRangeOver.scala
> ---------------------------------------
>
>                 Key: FLINK-9524
>                 URL: https://issues.apache.org/jira/browse/FLINK-9524
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.5.0
>            Reporter: yan zhou
>            Priority: Major
>         Attachments: npe_from_ProcTimeBoundedRangeOver.txt
>
>
> The class _ProcTimeBoundedRangeOver_ would throws NPE if _minRetentionTime_ 
> and _maxRetentionTime_ are set to greater then 1. 
> Please see [^npe_from_ProcTimeBoundedRangeOver.txt] for the detail of  
> exception. Below is a short description of the cause:
>  * When the first event for a key arrives,  the cleanup time is registered 
> with _timerservice_ and recorded in _cleanupTimeState_. If the second event 
> with same key arrives before the cleanup time, the value in 
> _cleanupTimeState_ is updated and a new timer is registered to 
> _timerService_. So now we have two registered timers for cleanup. One is 
> registered because of the first event, the other for the second event.
>  * However, when _onTimer_ method is fired for the first cleanup timer, the 
> _cleanupTimeStates_ value has already been updated to second cleanup time. So 
> it will bypass the _needToCleanupState_ check, and yet run through the 
> remained code of _onTimer_ (which is intended to update the accumulator and 
> emit output) and cause NPE.
> _RowTimeBoundedRangeOver_ has very similar logic with 
> _ProcTimeBoundedRangeOver. But_ It won't cause NPE by the same reason. To 
> avoid the exception, it simply add a null check before running the logic for 
> updating accumulator.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to