[ 
https://issues.apache.org/jira/browse/FLINK-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reopened FLINK-19158:
------------------------------------
      Assignee:     (was: Robert Metzger)

The problem still persists. This is a case from a PR build: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12400&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee.
 It contains my last fix already.

I should have added a log statement for each retry, and we should maybe print 
out the reason why wget failed. The logs are currently not very helpful.

{code}
05:34:30,679 [                main] INFO  
org.apache.flink.tests.util.cache.PersistingDownloadCache    [] - Downloading 
https://archive.apache.org/dist/hbase/1.4.3/hbase-1.4.3-bin.tar.gz.
05:46:30,701 [                main] ERROR 
org.apache.flink.tests.util.hbase.SQLClientHBaseITCase       [] - 
--------------------------------------------------------------------------------
Test testHBase[0: 
hbase-version:1.4.3](org.apache.flink.tests.util.hbase.SQLClientHBaseITCase) 
failed with:
java.io.IOException: Process ([wget, -q, -P, 
/home/vsts/work/1/e2e_cache/downloads/1598516010, --timeout, 240, 
https://archive.apache.org/dist/hbase/1.4.3/hbase-1.4.3-bin.tar.gz]) exceeded 
timeout (600000) or number of retries (3).
{code}.

I'm not sure if it makes sense to go into the rabbit hole of fixing this (by 
using fallback mirrors). I'd rather suggest to rely on one common method of 
binary distribution (docker images), and make their distribution as reliable as 
possible.

I'll leave this to the maintainers of this test.

> Revisit java e2e download timeouts
> ----------------------------------
>
>                 Key: FLINK-19158
>                 URL: https://issues.apache.org/jira/browse/FLINK-19158
>             Project: Flink
>          Issue Type: Improvement
>          Components: Build System
>    Affects Versions: 1.12.0
>            Reporter: Robert Metzger
>            Priority: Major
>              Labels: pull-request-available, test-stability
>             Fix For: 1.12.0, 1.13.0
>
>
> Consider this failed test case
> {code}
> Test testHBase(org.apache.flink.tests.util.hbase.SQLClientHBaseITCase) is 
> running.
> --------------------------------------------------------------------------------
> 09:38:38,719 [                main] INFO  
> org.apache.flink.tests.util.cache.PersistingDownloadCache    [] - Downloading 
> https://archive.apache.org/dist/hbase/1.4.3/hbase-1.4.3-bin.tar.gz.
> 09:40:38,732 [                main] ERROR 
> org.apache.flink.tests.util.hbase.SQLClientHBaseITCase       [] - 
> --------------------------------------------------------------------------------
> Test testHBase(org.apache.flink.tests.util.hbase.SQLClientHBaseITCase) failed 
> with:
> java.io.IOException: Process ([wget, -q, -P, 
> /home/vsts/work/1/e2e_cache/downloads/1598516010, 
> https://archive.apache.org/dist/hbase/1.4.3/hbase-1.4.3-bin.tar.gz]) exceeded 
> timeout (120000) or number of retries (3).
>       at 
> org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlockingWithRetry(AutoClosableProcess.java:148)
>       at 
> org.apache.flink.tests.util.cache.AbstractDownloadCache.getOrDownload(AbstractDownloadCache.java:127)
>       at 
> org.apache.flink.tests.util.cache.PersistingDownloadCache.getOrDownload(PersistingDownloadCache.java:36)
>       at 
> org.apache.flink.tests.util.hbase.LocalStandaloneHBaseResource.setupHBaseDist(LocalStandaloneHBaseResource.java:76)
>       at 
> org.apache.flink.tests.util.hbase.LocalStandaloneHBaseResource.before(LocalStandaloneHBaseResource.java:70)
>       at 
> org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:46)
> {code}
> It seems that the download has not been retried. The download might be stuck? 
> I would propose to set a timeout per try and increase the total time from 2 
> to 5 minutes.
> This example is from: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=6267&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to