Ideally, that list is updated with each release, yes. Non-current releases will now always download from archive.apache.org though. But we run into rate-limiting problems if that gets pinged too much. So yes good to keep the list only to current branches.
It looks like the download is cached in /tmp/test-spark, for what it's worth. On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <felixcheun...@hotmail.com> wrote: > +1 this has been problematic. > > Also, this list needs to be updated every time we make a new release? > > Plus can we cache them on Jenkins, maybe we can avoid downloading the same > thing from Apache archive every test run. > > > ------------------------------ > *From:* Marco Gaido <marcogaid...@gmail.com> > *Sent:* Monday, July 16, 2018 11:12 PM > *To:* Hyukjin Kwon > *Cc:* Sean Owen; dev > *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness of > HiveExternalCatalogVersionsSuite > > +1 too > > On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gurwls...@gmail.com> wrote: > >> +1 >> >> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sro...@apache.org>님이 작성: >> >>> Fix is committed to branches back through 2.2.x, where this test was >>> added. >>> >>> There is still some issue; I'm seeing that archive.apache.org is >>> rate-limiting downloads and frequently returning 503 errors. >>> >>> We can help, I guess, by avoiding testing against non-current releases. >>> Right now we should be testing against 2.3.1, 2.2.2, 2.1.3, right? 2.0.x is >>> now effectively EOL right? >>> >>> I can make that quick change too if everyone's amenable, in order to >>> prevent more failures in this test from master. >>> >>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sro...@gmail.com> wrote: >>> >>>> Yesterday I cleaned out old Spark releases from the mirror system -- >>>> we're supposed to only keep the latest release from active branches out on >>>> mirrors. (All releases are available from the Apache archive site.) >>>> >>>> Having done so I realized quickly that the >>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads being >>>> available from mirrors. It has been flaky, as sometimes mirrors are >>>> unreliable. I think now it will not work for any versions except 2.3.1, >>>> 2.2.2, 2.1.3. >>>> >>>> Because we do need to clean those releases out of the mirrors soon >>>> anyway, and because they're flaky sometimes, I propose adding logic to the >>>> test to fall back on downloading from the Apache archive site. >>>> >>>> ... and I'll do that right away to unblock >>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported to >>>> other branches as they will still be testing against potentially >>>> non-current Spark releases. >>>> >>>> Sean >>>> >>>