Yeah if the test code keeps around the archive and/or digest of what it
unpacked. A release should never be modified though, so highly rare.

If the worry is hacked mirrors then we might have bigger problems, but
there the issue is verifying the download sigs in the first place. Those
would have to come from archive.apache.org.

If you're up for it, yes that could be a fine security precaution.

On Thu, Jul 19, 2018, 2:11 PM Mark Hamstra <m...@clearstorydata.com> wrote:

> Is there or should there be some checking of digests just to make sure
> that we are really testing against the same thing in /tmp/test-spark that
> we are distributing from the archive?
>
> On Thu, Jul 19, 2018 at 11:15 AM Sean Owen <sro...@apache.org> wrote:
>
>> Ideally, that list is updated with each release, yes. Non-current
>> releases will now always download from archive.apache.org though. But we
>> run into rate-limiting problems if that gets pinged too much. So yes good
>> to keep the list only to current branches.
>>
>> It looks like the download is cached in /tmp/test-spark, for what it's
>> worth.
>>
>> On Thu, Jul 19, 2018 at 11:06 AM Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>>> +1 this has been problematic.
>>>
>>> Also, this list needs to be updated every time we make a new release?
>>>
>>> Plus can we cache them on Jenkins, maybe we can avoid downloading the
>>> same thing from Apache archive every test run.
>>>
>>>
>>> ------------------------------
>>> *From:* Marco Gaido <marcogaid...@gmail.com>
>>> *Sent:* Monday, July 16, 2018 11:12 PM
>>> *To:* Hyukjin Kwon
>>> *Cc:* Sean Owen; dev
>>> *Subject:* Re: Cleaning Spark releases from mirrors, and the flakiness
>>> of HiveExternalCatalogVersionsSuite
>>>
>>> +1 too
>>>
>>> On Tue, 17 Jul 2018, 05:38 Hyukjin Kwon, <gurwls...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> 2018년 7월 17일 (화) 오전 7:34, Sean Owen <sro...@apache.org>님이 작성:
>>>>
>>>>> Fix is committed to branches back through 2.2.x, where this test was
>>>>> added.
>>>>>
>>>>> There is still some issue; I'm seeing that archive.apache.org is
>>>>> rate-limiting downloads and frequently returning 503 errors.
>>>>>
>>>>> We can help, I guess, by avoiding testing against non-current
>>>>> releases. Right now we should be testing against 2.3.1, 2.2.2, 2.1.3,
>>>>> right? 2.0.x is now effectively EOL right?
>>>>>
>>>>> I can make that quick change too if everyone's amenable, in order to
>>>>> prevent more failures in this test from master.
>>>>>
>>>>> On Sun, Jul 15, 2018 at 3:51 PM Sean Owen <sro...@gmail.com> wrote:
>>>>>
>>>>>> Yesterday I cleaned out old Spark releases from the mirror system --
>>>>>> we're supposed to only keep the latest release from active branches out 
>>>>>> on
>>>>>> mirrors. (All releases are available from the Apache archive site.)
>>>>>>
>>>>>> Having done so I realized quickly that the
>>>>>> HiveExternalCatalogVersionsSuite relies on the versions it downloads 
>>>>>> being
>>>>>> available from mirrors. It has been flaky, as sometimes mirrors are
>>>>>> unreliable. I think now it will not work for any versions except 2.3.1,
>>>>>> 2.2.2, 2.1.3.
>>>>>>
>>>>>> Because we do need to clean those releases out of the mirrors soon
>>>>>> anyway, and because they're flaky sometimes, I propose adding logic to 
>>>>>> the
>>>>>> test to fall back on downloading from the Apache archive site.
>>>>>>
>>>>>> ... and I'll do that right away to unblock
>>>>>> HiveExternalCatalogVersionsSuite runs. I think it needs to be backported 
>>>>>> to
>>>>>> other branches as they will still be testing against potentially
>>>>>> non-current Spark releases.
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>

Reply via email to