GWphua commented on code in PR #17592: URL: https://github.com/apache/druid/pull/17592#discussion_r1914337448
########## integration-tests/docker/Dockerfile: ########## @@ -27,15 +27,19 @@ ARG ZK_VERSION ARG APACHE_ARCHIVE_MIRROR_HOST=https://archive.apache.org ARG SETUP_RETRIES=3 # Retry mechanism for running the setup script up to limit of 3 times. -RUN set -e; \ - for i in $(seq 1 $SETUP_RETRIES); do \ - echo "Attempt $i to run the setup script..."; \ +RUN i=0; \ + while [ $i -lt $SETUP_RETRIES ]; do \ APACHE_ARCHIVE_MIRROR_HOST=${APACHE_ARCHIVE_MIRROR_HOST} /root/base-setup.sh && break || { \ + i=$(($i + 1)); \ echo "Set up script attempt $i/$SETUP_RETRIES failed."; \ sleep 2; \ }; \ done; \ - rm -f /root/base-setup.sh + rm -f /root/base-setup.sh; \ + if [ "$i" -eq "$SETUP_RETRIES" ]; then \ + exit 1; \ Review Comment: Hi @kgyrtkirk @cryptoe @Akshat-Jain, I did some research into the `wget` commands, PTAL at my findings: ### Error Logging Turns out that our current `wget` command is called with the `-q` option, hence silencing all wget logs. Removing the `-q` option temporarily (I added it back after debugging, lmk if its better to simply not silence the logs) caused these logs to surface: 1. wget success ``` #8 15.32 --2025-01-13 07:50:55-- https://downloads.apache.org/zookeeper/zookeeper-3.8.4/apache-zookeeper-3.8.4-bin.tar.gz #8 15.33 Resolving downloads.apache.org (downloads.apache.org)... 88.99.208.237, 135.181.214.104, 2a01:4f9:3a:2c57::2, ... #8 15.34 Connecting to downloads.apache.org (downloads.apache.org)|88.99.208.237|:443... connected. #8 15.76 HTTP request sent, awaiting response... 200 OK ``` 2. wget failure ``` #8 14.55 --2025-01-13 07:42:57-- https://downloads.apache.org/zookeeper/zookeeper-3.8.4/apache-zookeeper-3.8.4-bin.tar.gz #8 14.56 Resolving downloads.apache.org (downloads.apache.org)... 135.181.214.104, 88.99.208.237, 2a01:4f8:10a:39da::2, ... #8 14.56 Connecting to downloads.apache.org (downloads.apache.org)|135.181.214.104|:443... failed: Connection timed out. #8 149.2 Connecting to downloads.apache.org (downloads.apache.org)|88.99.208.237|:443... failed: Connection timed out. #8 284.3 Connecting to downloads.apache.org (downloads.apache.org)|2a01:4f8:10a:39da::2|:443... failed: Network is unreachable. #8 284.3 Connecting to downloads.apache.org (downloads.apache.org)|2a01:4f9:3a:2c57::2|:443... failed: Network is unreachable. #8 ERROR: process "/bin/sh -c APACHE_ARCHIVE_MIRROR_HOST=${APACHE_ARCHIVE_MIRROR_HOST} /root/base-setup.sh && rm -f /root/base-setup.sh" did not complete successfully: exit code: 4 ------ > [druidbase 3/3] RUN APACHE_ARCHIVE_MIRROR_HOST=https://downloads.apache.org /root/base-setup.sh && rm -f /root/base-setup.sh: 14.04 Setting up python (2.7.16-1) ... 14.05 Setting up python-pkg-resources (40.8.0-1) ... 14.16 Setting up python-meld3 (1.0.2-2) ... 14.23 Setting up supervisor (3.3.5-1) ... 14.42 invoke-rc.d: could not determine current runlevel 14.43 invoke-rc.d: policy-rc.d denied execution of start. 14.52 Processing triggers for libc-bin (2.28-10+deb10u1) ... failed: Connection timed out. 284.3 Connecting to downloads.apache.org (downloads.apache.org)|2a01:4f8:10a:39da::2|:443... failed: Network is unreachable. 284.3 Connecting to downloads.apache.org (downloads.apache.org)|2a01:4f9:3a:2c57::2|:443... failed: Network is unreachable. ------ Dockerfile:28 -------------------- 26 | ARG ZK_VERSION 27 | ARG APACHE_ARCHIVE_MIRROR_HOST=https://downloads.apache.org 28 | >>> RUN APACHE_ARCHIVE_MIRROR_HOST=${APACHE_ARCHIVE_MIRROR_HOST} /root/base-setup.sh && rm -f /root/base-setup.sh 29 | 30 | -------------------- ERROR: failed to solve: process "/bin/sh -c APACHE_ARCHIVE_MIRROR_HOST=${APACHE_ARCHIVE_MIRROR_HOST} /root/base-setup.sh && rm -f /root/base-setup.sh" did not complete successfully: exit code: 4 ``` ### Findings 1. According to the wget manual, wget will conduct up to 20 retries by default. (configurable with `--retries=n`) 2. Between each retry, wget employs a linear backoff 1,2,... up to 10 seconds. 3. The retry condition does not apply to fatal errors like "connection refused" or "not found". 4. From the failure case, we get Network is unreachable errors when calling on IPv6 addresses. 5. These errors may cause wget to not trigger the default retry mechanism. ### Actions 1. Restrict `wget` to only call IPv4 addresses helps to both focus on the right addresses to call, and allow for retries. 2. Add `--continue` to prevent `wget` retries to download from scratch. 3. Revert the 3 retries for the entire `base-setup.sh` script. Chances are that if our `wget` fails a total of 20 times, the host is probably overloaded with download requests. A total of 60 `wget` may exacerbate the situation. 4. Added a message to indicate wget succeeded / failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
