[ 
https://issues.apache.org/jira/browse/FLINK-32668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746923#comment-17746923
 ] 

Matthias Pohl commented on FLINK-32668:
---------------------------------------

Thanks for raising this issue, [~loserwang1024]. I'm not sure about the error 
message, though: Isn't it more of a warning rather than an error message 
because the outcome of the function is still the desired one (i.e. the process 
is gone). But we can move that discussion into the corresponding PR. I'm gonna 
assign the issue to you and update the Jira metadata.

> fix up watchdog timeout error msg  in common.sh(e2e test) 
> ----------------------------------------------------------
>
>                 Key: FLINK-32668
>                 URL: https://issues.apache.org/jira/browse/FLINK-32668
>             Project: Flink
>          Issue Type: Improvement
>          Components: Build System / CI
>    Affects Versions: 1.17.1
>            Reporter: Hongshun Wang
>            Priority: Not a Priority
>             Fix For: 1.17.2
>
>         Attachments: image-2023-07-25-15-27-37-441.png
>
>
> When run e2e test, an error like this occrurs:
> !image-2023-07-25-15-27-37-441.png|width=733,height=115!
>  
> The corresponding code:
> {code:java}
> kill_test_watchdog() {
>     local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid)
>     echo "Stopping job timeout watchdog (with pid=$watchdog_pid)"
>     kill $watchdog_pid
> } 
> internal_run_with_timeout() {
>     local timeout_in_seconds="$1"
>     local on_failure="$2"
>     local command_label="$3"
>     local command="${@:4}"
>     on_exit kill_test_watchdog
>    (
>            command_pid=$BASHPID
>            (sleep "${timeout_in_seconds}" # set a timeout for this command
>             echo "${command_label:-"The command '${command}'"} (pid: 
> $command_pid) did not finish after $timeout_in_seconds seconds."
> eval "${on_failure}"
>            kill "$command_pid") & watchdog_pid=$!
>            echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid
>            # invoke
>           $command
>   )
> }{code}
>  
> When {{$command}} completes before the timeout, the watchdog process is 
> killed successfully. However, when {{$command}} times out, the watchdog 
> process kills {{$command}} and then exits itself, leaving behind an error 
> message when trying to kill its own process ID with {{{}kill 
> $watchdog_pid{}}}.This error msg "no such process" is hard to understand.
>  
> So, I will modify like this with better error message:
>  
> {code:java}
> kill_test_watchdog() {
>       local watchdog_pid=$(cat $TEST_DATA_DIR/job_watchdog.pid)
>       if kill -0 $watchdog_pid > /dev/null 2>&1; then
>            echo "Stopping job timeout watchdog (with pid=$watchdog_pid)"
>            kill $watchdog_pid
>       else
>             echo "[ERROR] Test is timeout"
>             exit 1       
>       fi
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to