[
https://issues.apache.org/jira/browse/FLINK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474582#comment-17474582
]
David Morávek commented on FLINK-25307:
---------------------------------------
I think using loopback only for these tests is a good idea in general and it
should make the e2e tests more robust / maybe slightly faster.
That the host IP keeps changing indeed sounds weird and might signal some
infrastructure issues we might want to look into as well.
(we're already planning on not binding to all interfaces by default in
https://issues.apache.org/jira/browse/FLINK-24474)
> Resuming Savepoint (hashmap, async, no parallelism change) end-to-end test
> timeout on azure
> -------------------------------------------------------------------------------------------
>
> Key: FLINK-25307
> URL: https://issues.apache.org/jira/browse/FLINK-25307
> Project: Flink
> Issue Type: Bug
> Components: Build System / Azure Pipelines, Runtime / Coordination
> Affects Versions: 1.13.3, 1.15.0
> Reporter: Yun Gao
> Assignee: Yun Gao
> Priority: Blocker
> Labels: pull-request-available, stale-critical, test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common.sh: line 860:
> kill: (93166) - No such process
> Dec 14 10:30:13 Stopping job timeout watchdog (with pid=93166)
> Dec 14 10:30:13 [FAIL] Test script contains errors.
> Dec 14 10:30:13 Checking for errors...
> Dec 14 10:30:14 No errors in log files.
> Dec 14 10:30:14 Checking for exceptions...
> Dec 14 10:30:14 No exceptions in log files.
> Dec 14 10:30:14 Checking for non-empty .out files...
> Dec 14 10:30:14 No non-empty .out files.
> Dec 14 10:30:14
> Dec 14 10:30:14 [FAIL] 'Resuming Savepoint (hashmap, async, no parallelism
> change) end-to-end test' failed after 15 minutes and 0 seconds! Test exited
> with exit code 1
> Dec 14 10:30:14
> 10:30:14 ##[group]Environment Information
> Dec 14 10:30:15 Searching for .dump, .dumpstream and related files in
> '/home/vsts/work/1/s'
> dmesg: read kernel buffer failed: Operation not permitted
> Dec 14 10:30:16 Stopping taskexecutor daemon (pid: 93751) on host fv-az43-70.
> Dec 14 10:30:17 Stopping standalonesession daemon (pid: 93500) on host
> fv-az43-70.
> The STDIO streams did not close within 10 seconds of the exit event from
> process '/usr/bin/bash'. This may indicate a child process inherited the
> STDIO streams and has not yet exited.
> ##[error]Bash exited with code '1'.
> Finishing: Run e2e tests
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=28088&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=79112
--
This message was sent by Atlassian Jira
(v8.20.1#820001)