The problem is that it's not really an "official" download link, but rather just a supplemental convenience. While that may be ok when distributing artifacts, it's more of a problem when actually building and testing artifacts. In the latter case, the download should really only be from an Apache mirror.
On Thu, Sep 14, 2017 at 1:20 AM, Wenchen Fan <cloud0...@gmail.com> wrote: > That test case is trying to test the backward compatibility of > `HiveExternalCatalog`. It downloads official Spark releases and creates > tables with them, and then read these tables via the current Spark. > > About the download link, I just picked it from the Spark website, and this > link is the default one when you choose "direct download". Do we have a > better choice? > > On Thu, Sep 14, 2017 at 3:05 AM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> Mark, I agree with your point on the risks of using Cloudfront while >> building Spark. I was only trying to provide background on when we >> started using Cloudfront. >> >> Personally, I don't have enough about context about the test case in >> question (e.g. Why are we downloading Spark in a test case ?). >> >> Thanks >> Shivaram >> >> On Wed, Sep 13, 2017 at 11:50 AM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> > Yeah, but that discussion and use case is a bit different -- providing a >> > different route to download the final released and approved artifacts >> that >> > were built using only acceptable artifacts and sources vs. building and >> > checking prior to release using something that is not from an Apache >> mirror. >> > This new use case puts us in the position of approving spark artifacts >> that >> > weren't built entirely from canonical resources located in presumably >> secure >> > and monitored repositories. Incorporating something that is not >> completely >> > trusted or approved into the process of building something that we are >> then >> > going to approve as trusted is different from the prior use of >> cloudfront. >> > >> > On Wed, Sep 13, 2017 at 10:26 AM, Shivaram Venkataraman >> > <shiva...@eecs.berkeley.edu> wrote: >> >> >> >> The bucket comes from Cloudfront, a CDN thats part of AWS. There was a >> >> bunch of discussion about this back in 2013 >> >> >> >> https://lists.apache.org/thread.html/9a72ff7ce913dd85a6b112b >> 1b2de536dcda74b28b050f70646aba0ac@1380147885@%3Cdev.spark.apache.org%3E >> >> >> >> Shivaram >> >> >> >> On Wed, Sep 13, 2017 at 9:30 AM, Sean Owen <so...@cloudera.com> wrote: >> >> > Not a big deal, but Mark noticed that this test now downloads Spark >> >> > artifacts from the same 'direct download' link available on the >> >> > downloads >> >> > page: >> >> > >> >> > >> >> > https://github.com/apache/spark/blob/master/sql/hive/src/ >> test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVe >> rsionsSuite.scala#L53 >> >> > >> >> > https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin- >> hadoop2.7.tgz >> >> > >> >> > I don't know of any particular problem with this, which is a parallel >> >> > download option in addition to the Apache mirrors. It's also the >> >> > default. >> >> > >> >> > Does anyone know what this bucket is and if there's a strong reason >> we >> >> > can't >> >> > just use mirrors? >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >