[ https://issues.apache.org/jira/browse/FLINK-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319406#comment-17319406 ]
Jamie Brandon commented on FLINK-22075: --------------------------------------- > remove the fix version tag I have no opinions on the fix version :) > close the issue I'm not particularly invested in the result, but the current behavior does seem strange to me. For a given row x, I'd expect either it falls within the watermark and produces (x,x) or falls outside it and produces nothing. Producing (x, null) definitely violates expectations from sql. --- Separately, I'm also a bit confused about the numbers. I looked at the distribution of how far behind each row is relative to the latest seen so far in [this generator|https://github.com/jamii/streaming-consistency/blob/c43722cea1394b0812d9fdbb20f063b47c9bd645/transactions.py] {code} 84872 -9 88237 -8 91645 -7 95450 -6 97864 -5 101262 -4 104701 -3 108722 -2 111735 -1 115512 0 {code} It's pretty uniform, which is what I expect from the random delay. If we add up from -9 to -6 it's 360204 records. Whereas for three runs with the same data I got 316074, 347470 and 321288 nulls. Is there some source of non-determinism in how the join handles watermarks? > Incorrect null outputs in left join > ----------------------------------- > > Key: FLINK-22075 > URL: https://issues.apache.org/jira/browse/FLINK-22075 > Project: Flink > Issue Type: Bug > Components: Table SQL / API > Affects Versions: 1.12.2 > Environment: > https://github.com/jamii/streaming-consistency/blob/4e5d144dacf85e512bdc7afd77d031b5974d733e/pkgs.nix#L25-L46 > ``` > [nix-shell:~/streaming-consistency/flink]$ java -version > openjdk version "1.8.0_265" > OpenJDK Runtime Environment (build 1.8.0_265-ga) > OpenJDK 64-Bit Server VM (build 25.265-bga, mixed mode) > [nix-shell:~/streaming-consistency/flink]$ flink --version > Version: 1.12.2, Commit ID: 4dedee0 > [nix-shell:~/streaming-consistency/flink]$ nix-info > system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.10, > channels(jamie): "", channels(root): "nixos-20.09.3554.f8929dce13e", nixpkgs: > /nix/var/nix/profiles/per-user/root/channels/nixos > ``` > Reporter: Jamie Brandon > Assignee: lincoln lee > Priority: Critical > > I'm left joining a table with itself > [here](https://github.com/jamii/streaming-consistency/blob/4e5d144dacf85e512bdc7afd77d031b5974d733e/flink/src/main/java/Demo.java#L55-L66). > The output should have no nulls, or at least emit nulls and then retract > them. Instead I see: > ``` > jamie@machine:~/streaming-consistency/flink$ wc -l tmp/outer_join_with_time > 100000 tmp/outer_join_with_time > jamie@machine:~/streaming-consistency/flink$ grep -c insert > tmp/outer_join_with_time > 100000 > jamie@machine:~/streaming-consistency/flink$ grep -c 'null' > tmp/outer_join_with_time > 16943 > ``` > ~1.7% of the outputs are incorrect and never retracted. > [Full > output](https://gist.githubusercontent.com/jamii/983fee41609b1425fe7fa59d3249b249/raw/069b9dcd4faf9f6113114381bc7028c6642ca787/gistfile1.txt) -- This message was sent by Atlassian Jira (v8.3.4#803005)