[ 
https://issues.apache.org/jira/browse/FLINK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319406#comment-17319406
 ] 

Jamie Brandon commented on FLINK-22075:
---------------------------------------

> remove the fix version tag

I have no opinions on the fix version :)

> close the issue

I'm not particularly invested in the result, but the current behavior does seem 
strange to me. For a given row x, I'd expect either it falls within the 
watermark and produces (x,x) or falls outside it and produces nothing. 
Producing (x, null) definitely violates expectations from sql.

---

Separately, I'm also a bit confused about the numbers. I looked at the 
distribution of how far behind each row is relative to the latest seen so far 
in [this 
generator|https://github.com/jamii/streaming-consistency/blob/c43722cea1394b0812d9fdbb20f063b47c9bd645/transactions.py]

{code}
  84872 -9
  88237 -8
  91645 -7
  95450 -6
  97864 -5
 101262 -4
 104701 -3
 108722 -2
 111735 -1
 115512 0
{code}

It's pretty uniform, which is what I expect from the random delay. If we add up 
from -9 to -6 it's 360204 records. Whereas for three runs with the same data I 
got 316074, 347470 and 321288 nulls. 

Is there some source of non-determinism in how the join handles watermarks?

> Incorrect null outputs in left join
> -----------------------------------
>
>                 Key: FLINK-22075
>                 URL: https://issues.apache.org/jira/browse/FLINK-22075
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API
>    Affects Versions: 1.12.2
>         Environment: 
> https://github.com/jamii/streaming-consistency/blob/4e5d144dacf85e512bdc7afd77d031b5974d733e/pkgs.nix#L25-L46
> ```
> [nix-shell:~/streaming-consistency/flink]$ java -version
> openjdk version "1.8.0_265"
> OpenJDK Runtime Environment (build 1.8.0_265-ga)
> OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode)
> [nix-shell:~/streaming-consistency/flink]$ flink --version
> Version: 1.12.2, Commit ID: 4dedee0
> [nix-shell:~/streaming-consistency/flink]$ nix-info
> system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, 
> channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: 
> /nix/var/nix/profiles/per-user/root/channels/nixos
> ```
>            Reporter: Jamie Brandon
>            Assignee: lincoln lee
>            Priority: Critical
>
> I'm left joining a table with itself 
> [here](https://github.com/jamii/streaming-consistency/blob/4e5d144dacf85e512bdc7afd77d031b5974d733e/flink/src/main/java/Demo.java#L55-L66).
>  The output should have no nulls, or at least emit nulls and then retract 
> them. Instead I see:
> ```
> jamie@machine:~/streaming-consistency/flink$ wc -l tmp/outer_join_with_time
> 100000 tmp/outer_join_with_time
> jamie@machine:~/streaming-consistency/flink$ grep -c insert 
> tmp/outer_join_with_time
> 100000
> jamie@machine:~/streaming-consistency/flink$ grep -c 'null' 
> tmp/outer_join_with_time
> 16943
> ```
> ~1.7% of the outputs are incorrect and never retracted.
> [Full 
> output](https://gist.githubusercontent.com/jamii/983fee41609b1425fe7fa59d3249b249/raw/069b9dcd4faf9f6113114381bc7028c6642ca787/gistfile1.txt)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to