[ 
https://issues.apache.org/jira/browse/FLINK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501555#comment-16501555
 ] 

Fabian Hueske commented on FLINK-9524:
--------------------------------------

Hi [~yzandrew], thanks for digging into this issue!

It would be great, if you would work on it.

One thing, that I'd like to understand first is why the state was cleared and 
returns {{null}}. 
I had a look at the code and did not spot a case that would cause an early 
state cleanup. So before we add a {{null}}-check I'd like to understand why 
there's a {{null}} since this might indicate a bigger problem and catching the 
{{null}} might just hide it.

Let me have a look at the operator again.

> NPE from ProcTimeBoundedRangeOver.scala
> ---------------------------------------
>
>                 Key: FLINK-9524
>                 URL: https://issues.apache.org/jira/browse/FLINK-9524
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.5.0
>            Reporter: yan zhou
>            Priority: Major
>         Attachments: npe_from_ProcTimeBoundedRangeOver.txt
>
>
> The class _ProcTimeBoundedRangeOver_ would throws NPE if _minRetentionTime_ 
> and _maxRetentionTime_ are set to greater then 1. 
> Please see [^npe_from_ProcTimeBoundedRangeOver.txt] for the detail of  
> exception. Below is a short description of the cause:
>  * When the first event for a key arrives,  the cleanup time is registered 
> with _timerservice_ and recorded in _cleanupTimeState_. If the second event 
> with same key arrives before the cleanup time, the value in 
> _cleanupTimeState_ is updated and a new timer is registered to 
> _timerService_. So now we have two registered timers for cleanup. One is 
> registered because of the first event, the other for the second event.
>  * However, when _onTimer_ method is fired for the first cleanup timer, the 
> _cleanupTimeStates_ value has already been updated to second cleanup time. So 
> it will bypass the _needToCleanupState_ check, and yet run through the 
> remained code of _onTimer_ (which is intended to update the accumulator and 
> emit output) and cause NPE.
> _RowTimeBoundedRangeOver_ has very similar logic with 
> _ProcTimeBoundedRangeOver. But_ It won't cause NPE by the same reason. To 
> avoid the exception, it simply add a null check before running the logic for 
> updating accumulator.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to