Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

Cheng Lian Fri, 29 Aug 2014 11:54:16 -0700

Just noticed one thing: although --with-hive is deprecated by -Phive,
make-distribution.sh still relies on $SPARK_HIVE (which was controlled by
--with-hive) to determine whether to include datanucleus jar files. This
means we have to do something like SPARK_HIVE=true ./make-distribution.sh
... to enable Hive support. Otherwise datanucleus jars are not included in
lib/.


This issue is similar to SPARK-3234
<https://issues.apache.org/jira/browse/SPARK-3234>, both
SPARK_HADOOP_VERSION and SPARK_HIVE are controlled by some deprecated
command line options.



On Fri, Aug 29, 2014 at 11:18 AM, Patrick Wendell <pwend...@gmail.com>
wrote:

> Oh darn - I missed this update. GRR, unfortunately I think this means
> I'll need to cut a new RC. Thanks for catching this Nick.
>
> On Fri, Aug 29, 2014 at 10:18 AM, Nicholas Chammas
> <nicholas.cham...@gmail.com> wrote:
> > [Let me know if I should be posting these comments in a different
> thread.]
> >
> > Should the default Spark version in spark-ec2 be updated for this
> release?
> >
> > Nick
> >
> >
> >
> > On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell <pwend...@gmail.com>
> > wrote:
> >>
> >> Hey Nicholas,
> >>
> >> Thanks for this, we can merge in doc changes outside of the actual
> >> release timeline, so we'll make sure to loop those changes in before
> >> we publish the final 1.1 docs.
> >>
> >> - Patrick
> >>
> >> On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas
> >> <nicholas.cham...@gmail.com> wrote:
> >> > There were several formatting and typographical errors in the SQL docs
> >> > that
> >> > I've fixed in this PR. Dunno if we want to roll that into the release.
> >> >
> >> >
> >> > On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell <pwend...@gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Okay I'll plan to add cdh4 binary as well for the final release!
> >> >>
> >> >> ---
> >> >> sent from my phone
> >> >> On Aug 29, 2014 8:26 AM, "Ye Xianjin" <advance...@gmail.com> wrote:
> >> >>
> >> >> > We just used CDH 4.7 for our production cluster. And I believe we
> >> >> > won't
> >> >> > use CDH 5 in the next year.
> >> >> >
> >> >> > Sent from my iPhone
> >> >> >
> >> >> > > On 2014年8月29日, at 14:39, Matei Zaharia <matei.zaha...@gmail.com>
> >> >> > > wrote:
> >> >> > >
> >> >> > > Personally I'd actually consider putting CDH4 back if there are
> >> >> > > still
> >> >> > users on it. It's always better to be inclusive, and the
> convenience
> >> >> > of
> >> >> > a
> >> >> > one-click download is high. Do we have a sense on what % of CDH
> users
> >> >> > still
> >> >> > use CDH4?
> >> >> > >
> >> >> > > Matei
> >> >> > >
> >> >> > > On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com
> )
> >> >> > > wrote:
> >> >> > >
> >> >> > > (Copying my reply since I don't know if it goes to the mailing
> >> >> > > list)
> >> >> > >
> >> >> > > Great, thanks for explaining the reasoning. You're saying these
> >> >> > > aren't
> >> >> > > going into the final release? I think that moots any issue
> >> >> > > surrounding
> >> >> > > distributing them then.
> >> >> > >
> >> >> > > This is all I know of from the ASF:
> >> >> > > https://community.apache.org/projectIndependence.html I don't
> read
> >> >> > > it
> >> >> > > as expressly forbidding this kind of thing although you can see
> how
> >> >> > > it
> >> >> > > bumps up against the spirit. There's not a bright line -- what
> >> >> > > about
> >> >> > > Tomcat providing binaries compiled for Windows for example? does
> >> >> > > that
> >> >> > > favor an OS vendor?
> >> >> > >
> >> >> > > From this technical ASF perspective only the releases matter --
> do
> >> >> > > what you want with snapshots and RCs. The only issue there is
> maybe
> >> >> > > releasing something different than was in the RC; is that at all
> >> >> > > confusing? Just needs a note.
> >> >> > >
> >> >> > > I think this theoretical issue doesn't exist if these binaries
> >> >> > > aren't
> >> >> > > released, so I see no reason to not proceed.
> >> >> > >
> >> >> > > The rest is a different question about whether you want to spend
> >> >> > > time
> >> >> > > maintaining this profile and candidate. The vendor already
> manages
> >> >> > > their build I think and -- and I don't know -- may even prefer
> not
> >> >> > > to
> >> >> > > have a different special build floating around. There's also the
> >> >> > > theoretical argument that this turns off other vendors from
> >> >> > > adopting
> >> >> > > Spark if it's perceived to be too connected to other vendors. I'd
> >> >> > > like
> >> >> > > to maximize Spark's distribution and there's some argument you do
> >> >> > > this
> >> >> > > by not making vendor profiles. But as I say a different question
> to
> >> >> > > just think about over time...
> >> >> > >
> >> >> > > (oh and PS for my part I think it's a good thing that CDH4
> binaries
> >> >> > > were removed. I wasn't arguing for resurrecting them)
> >> >> > >
> >> >> > >> On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell
> >> >> > >> <pwend...@gmail.com>
> >> >> > wrote:
> >> >> > >> Hey Sean,
> >> >> > >>
> >> >> > >> The reason there are no longer CDH-specific builds is that all
> >> >> > >> newer
> >> >> > >> versions of CDH and HDP work with builds for the upstream Hadoop
> >> >> > >> projects. I dropped CDH4 in favor of a newer Hadoop version
> (2.4)
> >> >> > >> and
> >> >> > >> the Hadoop-without-Hive (also 2.4) build.
> >> >> > >>
> >> >> > >> For MapR - we can't officially post those artifacts on ASF web
> >> >> > >> space
> >> >> > >> when we make the final release, we can only link to them as
> being
> >> >> > >> hosted by MapR specifically since they use non-compatible
> >> >> > >> licenses.
> >> >> > >> However, I felt that providing these during a testing period was
> >> >> > >> alright, with the goal of increasing test coverage. I couldn't
> >> >> > >> find
> >> >> > >> any policy against posting these on personal web space during RC
> >> >> > >> voting. However, we can remove them if there is one.
> >> >> > >>
> >> >> > >> Dropping CDH4 was more because it is now pretty old, but we can
> >> >> > >> add
> >> >> > >> it
> >> >> > >> back if people want. The binary packaging is a slightly separate
> >> >> > >> question from release votes, so I can always add more binary
> >> >> > >> packages
> >> >> > >> whenever. And on this, my main concern is covering the most
> >> >> > >> popular
> >> >> > >> Hadoop versions to lower the bar for users to build and test
> >> >> > >> Spark.
> >> >> > >>
> >> >> > >> - Patrick
> >> >> > >>
> >> >> > >>> On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen <
> so...@cloudera.com>
> >> >> > wrote:
> >> >> > >>> +1 I tested the source and Hadoop 2.4 release. Checksums and
> >> >> > >>> signatures are OK. Compiles fine with Java 8 on OS X. Tests...
> >> >> > >>> don't
> >> >> > >>> fail any more than usual.
> >> >> > >>>
> >> >> > >>> FWIW I've also been using the 1.1.0-SNAPSHOT for some time in
> >> >> > >>> another
> >> >> > >>> project and have encountered no problems.
> >> >> > >>>
> >> >> > >>>
> >> >> > >>> I notice that the 1.1.0 release removes the CDH4-specific
> build,
> >> >> > >>> but
> >> >> > >>> adds two MapR-specific builds. Compare with
> >> >> > >>> https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/
> I
> >> >> > >>> commented on the commit:
> >> >> > >>>
> >> >> >
> >> >> >
> >> >> >
> https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
> >> >> > >>>
> >> >> > >>> I'm in favor of removing all vendor-specific builds. This
> change
> >> >> > >>> *looks* a bit funny as there was no JIRA (?) and appears to
> swap
> >> >> > >>> one
> >> >> > >>> vendor for another. Of course there's nothing untoward going
> on,
> >> >> > >>> but
> >> >> > >>> what was the reasoning? It's best avoided, and MapR already
> >> >> > >>> distributes Spark just fine, no?
> >> >> > >>>
> >> >> > >>> This is a gray area with ASF projects. I mention it as well
> >> >> > >>> because
> >> >> > >>> it
> >> >> > >>> came up with Apache Flink recently
> >> >> > >>> (
> >> >> >
> >> >> >
> >> >> >
> http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E
> >> >> > )
> >> >> > >>> Another vendor rightly noted this could look like favoritism.
> >> >> > >>> They
> >> >> > >>> changed to remove vendor releases.
> >> >> > >>>
> >> >> > >>>> On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell
> >> >> > >>>> <pwend...@gmail.com>
> >> >> > wrote:
> >> >> > >>>> Please vote on releasing the following candidate as Apache
> Spark
> >> >> > version 1.1.0!
> >> >> > >>>>
> >> >> > >>>> The tag to be voted on is v1.1.0-rc2 (commit 711aebb3):
> >> >> > >>>>
> >> >> >
> >> >> >
> >> >> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327
> >> >> > >>>>
> >> >> > >>>> The release files, including signatures, digests, etc. can be
> >> >> > >>>> found
> >> >> > at:
> >> >> > >>>> http://people.apache.org/~pwendell/spark-1.1.0-rc2/
> >> >> > >>>>
> >> >> > >>>> Release artifacts are signed with the following key:
> >> >> > >>>> https://people.apache.org/keys/committer/pwendell.asc
> >> >> > >>>>
> >> >> > >>>> The staging repository for this release can be found at:
> >> >> > >>>>
> >> >> >
> >> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1029/
> >> >> > >>>>
> >> >> > >>>> The documentation corresponding to this release can be found
> at:
> >> >> > >>>> http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/
> >> >> > >>>>
> >> >> > >>>> Please vote on releasing this package as Apache Spark 1.1.0!
> >> >> > >>>>
> >> >> > >>>> The vote is open until Monday, September 01, at 03:11 UTC and
> >> >> > >>>> passes
> >> >> > if
> >> >> > >>>> a majority of at least 3 +1 PMC votes are cast.
> >> >> > >>>>
> >> >> > >>>> [ ] +1 Release this package as Apache Spark 1.1.0
> >> >> > >>>> [ ] -1 Do not release this package because ...
> >> >> > >>>>
> >> >> > >>>> To learn more about Apache Spark, please see
> >> >> > >>>> http://spark.apache.org/
> >> >> > >>>>
> >> >> > >>>> == Regressions fixed since RC1 ==
> >> >> > >>>> LZ4 compression issue:
> >> >> > https://issues.apache.org/jira/browse/SPARK-3277
> >> >> > >>>>
> >> >> > >>>> == What justifies a -1 vote for this release? ==
> >> >> > >>>> This vote is happening very late into the QA period compared
> >> >> > >>>> with
> >> >> > >>>> previous votes, so -1 votes should only occur for significant
> >> >> > >>>> regressions from 1.0.2. Bugs already present in 1.0.X will not
> >> >> > >>>> block
> >> >> > >>>> this release.
> >> >> > >>>>
> >> >> > >>>> == What default changes should I be aware of? ==
> >> >> > >>>> 1. The default value of "spark.io.compression.codec" is now
> >> >> > >>>> "snappy"
> >> >> > >>>> --> Old behavior can be restored by switching to "lzf"
> >> >> > >>>>
> >> >> > >>>> 2. PySpark now performs external spilling during aggregations.
> >> >> > >>>> --> Old behavior can be restored by setting
> >> >> > >>>> "spark.shuffle.spill"
> >> >> > >>>> to
> >> >> > "false".
> >> >> > >>>>
> >> >> > >>>>
> >> >> > >>>>
> >> >> > >>>>
> ---------------------------------------------------------------------
> >> >> > >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >> > >>>> For additional commands, e-mail: dev-h...@spark.apache.org
> >> >> > >
> >> >> > >
> >> >> > >
> ---------------------------------------------------------------------
> >> >> > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> >> > > For additional commands, e-mail: dev-h...@spark.apache.org
> >> >> > >
> >> >> >
> >> >
> >> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

Reply via email to