Re: Downloading Hadoop from s3://spark-related-packages/

Nicholas Chammas Wed, 23 Dec 2015 22:00:40 -0800

FYI: I opened an INFRA ticket with questions about how best to use the
Apache mirror network.


https://issues.apache.org/jira/browse/INFRA-10999

Nick

On Mon, Nov 2, 2015 at 8:00 AM Luciano Resende <[email protected]> wrote:

> I am getting the same results using closer.lua versus close.cgi, which
> seems to be downloading a page where the user can choose the closest
> mirror. I tried to add parameters to follow redirect without much success.
> There seems to be already a jira for a similar request with infra:
> https://issues.apache.org/jira/browse/INFRA-10240.
>
> A workaround is to use a url pointing to the mirror directly.
>
> curl -O -L
> http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
>
> I second the lack of documentation on what is available with these
> scripts, I'll see if I can find the source and try to see other options.
>
>
> On Sun, Nov 1, 2015 at 8:40 PM, Shivaram Venkataraman <
> [email protected]> wrote:
>
>> I think the lua one at
>>
>> https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua
>> has replaced the cgi one from before. Also it looks like the lua one
>> also supports `action=download` with a filename argument. So you could
>> just do something like
>>
>> wget
>> http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz&action=download
>>
>> Thanks
>> Shivaram
>>
>> On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas
>> <[email protected]> wrote:
>> > Oh, sweet! For example:
>> >
>> >
>> http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1
>> >
>> > Thanks for sharing that tip. Looks like you can also use as_json (vs.
>> > asjson).
>> >
>> > Nick
>> >
>> >
>> > On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman
>> > <[email protected]> wrote:
>> >>
>> >> On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas
>> >> <[email protected]> wrote:
>> >> > OK, I’ll focus on the Apache mirrors going forward.
>> >> >
>> >> > The problem with the Apache mirrors, if I am not mistaken, is that
>> you
>> >> > cannot use a single URL that automatically redirects you to a working
>> >> > mirror
>> >> > to download Hadoop. You have to pick a specific mirror and pray it
>> >> > doesn’t
>> >> > disappear tomorrow.
>> >> >
>> >> > They don’t go away, especially http://mirror.ox.ac.uk , and in the
>> us
>> >> > the
>> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
>> kept.
>> >> >
>> >> > So does Apache offer no way to query a URL and automatically get the
>> >> > closest
>> >> > working mirror? If I’m installing HDFS onto servers in various EC2
>> >> > regions,
>> >> > the best mirror will vary depending on my location.
>> >> >
>> >> Not sure if this is officially documented somewhere but if you pass
>> >> '&asjson=1' you will get back a JSON which has a 'preferred' field set
>> >> to the closest mirror.
>> >>
>> >> Shivaram
>> >> > Nick
>> >> >
>> >> >
>> >> > On Sun, Nov 1, 2015 at 12:25 PM Shivaram Venkataraman
>> >> > <[email protected]> wrote:
>> >> >>
>> >> >> I think that getting them from the ASF mirrors is a better strategy
>> in
>> >> >> general as it'll remove the overhead of keeping the S3 bucket up to
>> >> >> date. It works in the spark-ec2 case because we only support a
>> limited
>> >> >> number of Hadoop versions from the tool. FWIW I don't have write
>> >> >> access to the bucket and also haven't heard of any plans to support
>> >> >> newer versions in spark-ec2.
>> >> >>
>> >> >> Thanks
>> >> >> Shivaram
>> >> >>
>> >> >> On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <
>> [email protected]>
>> >> >> wrote:
>> >> >> >
>> >> >> > On 1 Nov 2015, at 03:17, Nicholas Chammas
>> >> >> > <[email protected]>
>> >> >> > wrote:
>> >> >> >
>> >> >> > https://s3.amazonaws.com/spark-related-packages/
>> >> >> >
>> >> >> > spark-ec2 uses this bucket to download and install HDFS on
>> clusters.
>> >> >> > Is
>> >> >> > it
>> >> >> > owned by the Spark project or by the AMPLab?
>> >> >> >
>> >> >> > Anyway, it looks like the latest Hadoop install available on
>> there is
>> >> >> > Hadoop
>> >> >> > 2.4.0.
>> >> >> >
>> >> >> > Are there plans to add newer versions of Hadoop for use by
>> spark-ec2
>> >> >> > and
>> >> >> > similar tools, or should we just be getting that stuff via an
>> Apache
>> >> >> > mirror?
>> >> >> > The latest version is 2.7.1, by the way.
>> >> >> >
>> >> >> >
>> >> >> > you should be grabbing the artifacts off the ASF and then
>> verifying
>> >> >> > their
>> >> >> > SHA1 checksums as published on the ASF HTTPS web site
>> >> >> >
>> >> >> >
>> >> >> > The problem with the Apache mirrors, if I am not mistaken, is that
>> >> >> > you
>> >> >> > cannot use a single URL that automatically redirects you to a
>> working
>> >> >> > mirror
>> >> >> > to download Hadoop. You have to pick a specific mirror and pray it
>> >> >> > doesn't
>> >> >> > disappear tomorrow.
>> >> >> >
>> >> >> >
>> >> >> > They don't go away, especially http://mirror.ox.ac.uk , and in
>> the us
>> >> >> > the
>> >> >> > apache.osuosl.org, osu being a where a lot of the ASF servers are
>> >> >> > kept.
>> >> >> >
>> >> >> > full list with availability stats
>> >> >> >
>> >> >> > http://www.apache.org/mirrors/
>> >> >> >
>> >> >> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: Downloading Hadoop from s3://spark-related-packages/

Reply via email to