[ 
https://issues.apache.org/jira/browse/FLINK-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated FLINK-14843:
--------------------------
    Priority: Critical  (was: Major)

Upgrading to Critical since it reproduces in release-1.10 nightly build, and I 
suggest to fix it as soon as possible if the given PR looks good [~banmoy] 
[~gjy]. Thanks.

> Streaming bucketing end-to-end test can fail with Output hash mismatch
> ----------------------------------------------------------------------
>
>                 Key: FLINK-14843
>                 URL: https://issues.apache.org/jira/browse/FLINK-14843
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Tests
>    Affects Versions: 1.10.0
>         Environment: rev: dcc1330375826b779e4902176bb2473704dabb11
>            Reporter: Gary Yao
>            Assignee: PengFei Li
>            Priority: Critical
>              Labels: pull-request-available, test-stability
>             Fix For: 1.10.0
>
>         Attachments: complete_result, 
> flink-gary-standalonesession-0-gyao-desktop.log, 
> flink-gary-taskexecutor-0-gyao-desktop.log, 
> flink-gary-taskexecutor-1-gyao-desktop.log, 
> flink-gary-taskexecutor-2-gyao-desktop.log, 
> flink-gary-taskexecutor-3-gyao-desktop.log, 
> flink-gary-taskexecutor-4-gyao-desktop.log, 
> flink-gary-taskexecutor-5-gyao-desktop.log, 
> flink-gary-taskexecutor-6-gyao-desktop.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> Streaming bucketing end-to-end test ({{test_streaming_bucketing.sh}}) can 
> fail with Output hash mismatch.
> {noformat}
> Number of running task managers has reached 4.
> Job (e0b7a86e4d4111f3947baa3d004e083a) is running.
> Waiting until all values have been produced
> Truncating buckets
> Number of produced values 26930/60000
> Truncating buckets
> Number of produced values 30890/60000
> Truncating buckets
> Number of produced values 37340/60000
> Truncating buckets
> Number of produced values 41290/60000
> Truncating buckets
> Number of produced values 46710/60000
> Truncating buckets
> Number of produced values 52120/60000
> Truncating buckets
> Number of produced values 57110/60000
> Truncating buckets
> Number of produced values 62530/60000
> Cancelling job e0b7a86e4d4111f3947baa3d004e083a.
> Cancelled job e0b7a86e4d4111f3947baa3d004e083a.
> Waiting for job (e0b7a86e4d4111f3947baa3d004e083a) to reach terminal state 
> CANCELED ...
> Job (e0b7a86e4d4111f3947baa3d004e083a) reached terminal state CANCELED
> Job e0b7a86e4d4111f3947baa3d004e083a was cancelled, time to verify
> FAIL Bucketing Sink: Output hash mismatch.  Got 
> 9e00429abfb30eea4f459eb812b470ad, expected 01aba5ff77a0ef5e5cf6a727c248bdc3.
> head hexdump of actual:
> 0000000   (   2   ,   1   0   ,   0   ,   S   o   m   e       p   a   y
> 0000010   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   1
> 0000020   ,   S   o   m   e       p   a   y   l   o   a   d   .   .   .
> 0000030   )  \n   (   2   ,   1   0   ,   2   ,   S   o   m   e       p
> 0000040   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0
> 0000050   ,   3   ,   S   o   m   e       p   a   y   l   o   a   d   .
> 0000060   .   .   )  \n   (   2   ,   1   0   ,   4   ,   S   o   m   e
> 0000070       p   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,
> 0000080   1   0   ,   5   ,   S   o   m   e       p   a   y   l   o   a
> 0000090   d   .   .   .   )  \n   (   2   ,   1   0   ,   6   ,   S   o
> 00000a0   m   e       p   a   y   l   o   a   d   .   .   .   )  \n   (
> 00000b0   2   ,   1   0   ,   7   ,   S   o   m   e       p   a   y   l
> 00000c0   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   8   ,
> 00000d0   S   o   m   e       p   a   y   l   o   a   d   .   .   .   )
> 00000e0  \n   (   2   ,   1   0   ,   9   ,   S   o   m   e       p   a
> 00000f0   y   l   o   a   d   .   .   .   )  \n                        
> 00000fa
> Stopping taskexecutor daemon (pid: 55164) on host gyao-desktop.
> Stopping standalonesession daemon (pid: 51073) on host gyao-desktop.
> Stopping taskexecutor daemon (pid: 51504) on host gyao-desktop.
> Skipping taskexecutor daemon (pid: 52034), because it is not running anymore 
> on gyao-desktop.
> Skipping taskexecutor daemon (pid: 52472), because it is not running anymore 
> on gyao-desktop.
> Skipping taskexecutor daemon (pid: 52916), because it is not running anymore 
> on gyao-desktop.
> Stopping taskexecutor daemon (pid: 54121) on host gyao-desktop.
> Stopping taskexecutor daemon (pid: 54726) on host gyao-desktop.
> [FAIL] Test script contains errors.
> Checking of logs skipped.
> [FAIL] 'flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh' 
> failed after 2 minutes and 3 seconds! Test exited with exit code 1
> {noformat}
> *How to reproduce*
> Comment out the delay of 10s after the 1st TM is restarted to provoke the 
> issue:
> {code:bash}
> echo "Restarting 1 TM"
> $FLINK_DIR/bin/taskmanager.sh start
> wait_for_number_of_running_tms 4
> #sleep 10
> echo "Killing 2 TMs"
> kill_random_taskmanager
> kill_random_taskmanager
> wait_for_number_of_running_tms 2
> {code}
> Command to run the test:
> {noformat}
> FLINK_DIR=build-target/ flink-end-to-end-tests/run-single-test.sh skip 
> flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to