[
https://issues.apache.org/jira/browse/FLINK-10842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696054#comment-16696054
]
ASF GitHub Bot commented on FLINK-10842:
----------------------------------------
twalthr commented on a change in pull request #7073: [FLINK-10842][E2E tests]
fix broken waiting loops in common.sh
URL: https://github.com/apache/flink/pull/7073#discussion_r235772194
##########
File path: flink-end-to-end-tests/test-scripts/common.sh
##########
@@ -242,30 +245,45 @@ function start_taskmanagers {
}
function start_and_wait_for_tm {
- local url="${REST_PROTOCOL}://${NODENAME}:8081/taskmanagers"
-
- tm_query_result=$(curl ${CURL_SSL_ARGS} -s "${url}")
-
+ tm_query_result=`query_running_tms`
# we assume that the cluster is running
if ! [[ ${tm_query_result} =~ \{\"taskmanagers\":\[.*\]\} ]]; then
echo "Your cluster seems to be unresponsive at the moment:
${tm_query_result}" 1>&2
exit 1
fi
- running_tms=`curl ${CURL_SSL_ARGS} -s "${url}" | grep -o "id" | wc -l`
-
+ running_tms=`query_number_of_running_tms`
${FLINK_DIR}/bin/taskmanager.sh start
+ wait_for_number_of_running_tms $((running_tms+1))
+}
- for i in {1..10}; do
- local new_running_tms=`curl ${CURL_SSL_ARGS} -s "${url}" | grep -o "id" |
wc -l`
- if [ $((new_running_tms-running_tms)) -eq 0 ]; then
- echo "TaskManager is not yet up."
+function query_running_tms {
+ local url="${REST_PROTOCOL}://${NODENAME}:8081/taskmanagers"
+ curl ${CURL_SSL_ARGS} -s "${url}"
+}
+
+function query_number_of_running_tms {
+ query_running_tms | grep -o "id" | wc -l
Review comment:
Same concern here. `grep -o` can throw an error that should be caught.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Waiting loops are broken in e2e/common.sh
> -----------------------------------------
>
> Key: FLINK-10842
> URL: https://issues.apache.org/jira/browse/FLINK-10842
> Project: Flink
> Issue Type: Bug
> Components: E2E Tests
> Affects Versions: 1.7.0
> Reporter: Andrey Zagrebin
> Assignee: Andrey Zagrebin
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.8.0
>
>
> There are 3 loops in flink-end-to-end-tests/test-scripts/common.sh where the
> script waits for some event to happen (for i in \{1..10}; do):
> - wait_dispatcher_running
> - start_and_wait_for_tm
> - wait_job_running
> All loops have 10 iterations and the loop breaks if the awaited event
> happens. If timeout occurs then the script does not fail and the function
> just continues after 10 iterations ignoring that the awaited event did not
> happen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)