+1

Ran local tests and tested our spark apps on a spark+yarn cluster.

Cheers,

Sean


> On Mar 8, 2015, at 11:51 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:
> 
> +1 (non-binding, doc and packaging issues aside)
> 
> Built from source, ran jobs and spark-shell against a pseudo-distributed
> YARN cluster.
> 
> On Sun, Mar 8, 2015 at 2:42 PM, Krishna Sankar <ksanka...@gmail.com> wrote:
> 
>> Yep, otherwise this will become an N^2 problem - Scala versions X Hadoop
>> Distributions X ...
>> 
>> May be one option is to have a minimum basic set (which I know is what we
>> are discussing) and move the rest to spark-packages.org. There the vendors
>> can add the latest downloads - for example when 1.4 is released, HDP can
>> build a release of HDP Spark 1.4 bundle.
>> 
>> Cheers
>> <k/>
>> 
>> On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>> 
>>> We probably want to revisit the way we do binaries in general for
>>> 1.4+. IMO, something worth forking a separate thread for.
>>> 
>>> I've been hesitating to add new binaries because people
>>> (understandably) complain if you ever stop packaging older ones, but
>>> on the other hand the ASF has complained that we have too many
>>> binaries already and that we need to pare it down because of the large
>>> volume of files. Doubling the number of binaries we produce for Scala
>>> 2.11 seemed like it would be too much.
>>> 
>>> One solution potentially is to actually package "Hadoop provided"
>>> binaries and encourage users to use these by simply setting
>>> HADOOP_HOME, or have instructions for specific distros. I've heard
>>> that our existing packages don't work well on HDP for instance, since
>>> there are some configuration quirks that differ from the upstream
>>> Hadoop.
>>> 
>>> If we cut down on the cross building for Hadoop versions, then it is
>>> more tenable to cross build for Scala versions without exploding the
>>> number of binaries.
>>> 
>>> - Patrick
>>> 
>>> On Sun, Mar 8, 2015 at 12:46 PM, Sean Owen <so...@cloudera.com> wrote:
>>>> Yeah, interesting question of what is the better default for the
>>>> single set of artifacts published to Maven. I think there's an
>>>> argument for Hadoop 2 and perhaps Hive for the 2.10 build too. Pros
>>>> and cons discussed more at
>>>> 
>>>> https://issues.apache.org/jira/browse/SPARK-5134
>>>> https://github.com/apache/spark/pull/3917
>>>> 
>>>> On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia <matei.zaha...@gmail.com
>>> 
>>> wrote:
>>>>> +1
>>>>> 
>>>>> Tested it on Mac OS X.
>>>>> 
>>>>> One small issue I noticed is that the Scala 2.11 build is using Hadoop
>>> 1 without Hive, which is kind of weird because people will more likely
>> want
>>> Hadoop 2 with Hive. So it would be good to publish a build for that
>>> configuration instead. We can do it if we do a new RC, or it might be
>> that
>>> binary builds may not need to be voted on (I forgot the details there).
>>>>> 
>>>>> Matei
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to