Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Michael Armbrust Fri, 11 Dec 2015 11:24:07 -0800

Trying again now that eec36607
<https://github.com/apache/spark/commit/eec36607f9fc92b6c4d306e3930fcf03961625eb>
is
merged.


On Thu, Dec 10, 2015 at 6:44 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> Cutting RC2 now.
>
> On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust <mich...@databricks.com
> > wrote:
>
>> We are getting close to merging patches for SPARK-12155
>> <https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253
>> <https://issues.apache.org/jira/browse/SPARK-12253>.  I'll be cutting
>> RC2 shortly after that.
>>
>> Michael
>>
>> On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> An update: the vote fails due to the -1.   I'll post another RC as soon
>>> as we've resolved these issues.  In the mean time I encourage people to
>>> continue testing and post any problems they encounter here.
>>>
>>> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai <yh...@databricks.com> wrote:
>>>
>>>> -1
>>>>
>>>> Tow blocker bugs have been found after this RC.
>>>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data
>>>> corruption when an external sorter spills data.
>>>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks
>>>> from acquiring memory even when the executor indeed can allocate memory by
>>>> evicting storage memory.
>>>>
>>>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We
>>>> are still working on https://issues.apache.org/jira/browse/SPARK-12155.
>>>>
>>>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra <m...@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> 0
>>>>>
>>>>> Currently figuring out who is responsible for the regression that I am
>>>>> seeing in some user code ScalaUDFs that make use of Timestamps and where
>>>>> NULL from a CSV file read in via a TestHive#registerTestTable is now
>>>>> producing 1969-12-31 23:59:59.999999 instead of null.
>>>>>
>>>>> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <so...@cloudera.com> wrote:
>>>>>
>>>>>> Licenses and signature are all fine.
>>>>>>
>>>>>> Docker integration tests consistently fail for me with Java 7 / Ubuntu
>>>>>> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"
>>>>>>
>>>>>> *** RUN ABORTED ***
>>>>>>   java.lang.NoSuchMethodError:
>>>>>>
>>>>>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>>>>>>   at
>>>>>> org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240)
>>>>>>   at
>>>>>> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>>>>>>   at
>>>>>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>>>>>>   at
>>>>>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>>>>>>
>>>>>> I also get this failure consistently:
>>>>>>
>>>>>> DirectKafkaStreamSuite
>>>>>> - offset recovery *** FAILED ***
>>>>>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>>>>>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>>>>>
>>>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>>>>
>>>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>>>>
>>>>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]))))
>>>>>> was false Recovered ranges are not the same as the ones generated
>>>>>> (DirectKafkaStreamSuite.scala:301)
>>>>>>
>>>>>> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <
>>>>>> mich...@databricks.com> wrote:
>>>>>> > Please vote on releasing the following candidate as Apache Spark
>>>>>> version
>>>>>> > 1.6.0!
>>>>>> >
>>>>>> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and
>>>>>> passes if
>>>>>> > a majority of at least 3 +1 PMC votes are cast.
>>>>>> >
>>>>>> > [ ] +1 Release this package as Apache Spark 1.6.0
>>>>>> > [ ] -1 Do not release this package because ...
>>>>>> >
>>>>>> > To learn more about Apache Spark, please see
>>>>>> http://spark.apache.org/
>>>>>> >
>>>>>> > The tag to be voted on is v1.6.0-rc1
>>>>>> > (bf525845cef159d2d4c9f4d64e158f037179b5c4)
>>>>>> >
>>>>>> > The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> >
>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>>>>>> >
>>>>>> > Release artifacts are signed with the following key:
>>>>>> > https://people.apache.org/keys/committer/pwendell.asc
>>>>>> >
>>>>>> > The staging repository for this release can be found at:
>>>>>> >
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1165/
>>>>>> >
>>>>>> > The test repository (versioned as v1.6.0-rc1) for this release can
>>>>>> be found
>>>>>> > at:
>>>>>> >
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1164/
>>>>>> >
>>>>>> > The documentation corresponding to this release can be found at:
>>>>>> >
>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>>>>>> >
>>>>>> >
>>>>>> > =======================================
>>>>>> > == How can I help test this release? ==
>>>>>> > =======================================
>>>>>> > If you are a Spark user, you can help us test this release by
>>>>>> taking an
>>>>>> > existing Spark workload and running on this release candidate, then
>>>>>> > reporting any regressions.
>>>>>> >
>>>>>> > ================================================
>>>>>> > == What justifies a -1 vote for this release? ==
>>>>>> > ================================================
>>>>>> > This vote is happening towards the end of the 1.6 QA period, so -1
>>>>>> votes
>>>>>> > should only occur for significant regressions from 1.5. Bugs
>>>>>> already present
>>>>>> > in 1.5, minor regressions, or bugs related to new features will not
>>>>>> block
>>>>>> > this release.
>>>>>> >
>>>>>> > ===============================================================
>>>>>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>>>>>> > ===============================================================
>>>>>> > 1. It is OK for documentation patches to target 1.6.0 and still go
>>>>>> into
>>>>>> > branch-1.6, since documentations will be published separately from
>>>>>> the
>>>>>> > release.
>>>>>> > 2. New features for non-alpha-modules should target 1.7+.
>>>>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>>>>> target
>>>>>> > version.
>>>>>> >
>>>>>> >
>>>>>> > ==================================================
>>>>>> > == Major changes to help you focus your testing ==
>>>>>> > ==================================================
>>>>>> >
>>>>>> > Spark SQL
>>>>>> >
>>>>>> > SPARK-10810 Session Management - The ability to create multiple
>>>>>> isolated SQL
>>>>>> > Contexts that have their own configuration and default database.
>>>>>> This is
>>>>>> > turned on by default in the thrift server.
>>>>>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>>>>>> performs
>>>>>> > many operations on serialized binary data and code generation (i.e.
>>>>>> Project
>>>>>> > Tungsten).
>>>>>> > SPARK-10000 Unified Memory Management - Shared memory for execution
>>>>>> and
>>>>>> > caching instead of exclusive division of the regions.
>>>>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>>>>>> queries
>>>>>> > over files of any supported format without registering a table.
>>>>>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>>>>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>>>>>> > SPARK-10412 Per-operator Metics for SQL Execution - Display
>>>>>> statistics on a
>>>>>> > per-operator basis for memory usage and spilled data size.
>>>>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
>>>>>> nest and
>>>>>> > unest arbitrary numbers of columns
>>>>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>>>>>> Significant
>>>>>> > (up to 14x) speed up when caching data that contains complex types
>>>>>> in
>>>>>> > DataFrames or SQL.
>>>>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality
>>>>>> (<=>) will
>>>>>> > now execute using SortMergeJoin instead of computing a cartisian
>>>>>> product.
>>>>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>>>>>> configuring
>>>>>> > query execution to occur using off-heap memory to avoid GC overhead
>>>>>> > SPARK-10978 Datasource API Avoid Double Filter - When implementing a
>>>>>> > datasource with filter pushdown, developers can now tell Spark SQL
>>>>>> to avoid
>>>>>> > double evaluating a pushed-down filter.
>>>>>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning
>>>>>> and
>>>>>> > ordering schemes in In-memory table scan, and adding distributeBy
>>>>>> and
>>>>>> > localSort to DF API
>>>>>> > SPARK-9858  Adaptive query execution - Initial support for
>>>>>> automatically
>>>>>> > selecting the number of reducers for joins and aggregations.
>>>>>> >
>>>>>> > Spark Streaming
>>>>>> >
>>>>>> > API Updates
>>>>>> >
>>>>>> > SPARK-2629  New improved state management - trackStateByKey - a
>>>>>> DStream
>>>>>> > transformation for stateful stream processing, supersedes
>>>>>> updateStateByKey
>>>>>> > in functionality and performance.
>>>>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>>>>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>>>>>> > KPL-aggregated records.
>>>>>> > SPARK-10891 Kinesis message handler function - Allows arbitrary
>>>>>> function to
>>>>>> > be applied to a Kinesis record in the Kinesis receiver before to
>>>>>> customize
>>>>>> > what data is to be stored in memory.
>>>>>> > SPARK-6328  Python Streaming Listener API - Get streaming statistics
>>>>>> > (scheduling delays, batch processing times, etc.) in streaming.
>>>>>> >
>>>>>> > UI Improvements
>>>>>> >
>>>>>> > Made failures visible in the streaming tab, in the timelines, batch
>>>>>> list,
>>>>>> > and batch details page.
>>>>>> > Made output operations visible in the streaming tab as progress bars
>>>>>> >
>>>>>> > MLlib
>>>>>> >
>>>>>> > New algorithms/models
>>>>>> >
>>>>>> > SPARK-8518  Survival analysis - Log-linear model for survival
>>>>>> analysis
>>>>>> > SPARK-9834  Normal equation for least squares - Normal equation
>>>>>> solver,
>>>>>> > providing R-like model summary statistics
>>>>>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>>>>>> Streaming
>>>>>> > framework
>>>>>> > SPARK-9930  New feature transformers - ChiSqSelector,
>>>>>> QuantileDiscretizer,
>>>>>> > SQL transformer
>>>>>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>>>>>> variant
>>>>>> > of K-Means
>>>>>> >
>>>>>> > API improvements
>>>>>> >
>>>>>> > ML Pipelines
>>>>>> >
>>>>>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>>>>>> partial
>>>>>> > coverage of spark.ml algorithms
>>>>>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet
>>>>>> Allocation in ML
>>>>>> > Pipelines
>>>>>> >
>>>>>> > R API
>>>>>> >
>>>>>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>>>>>> ordinary
>>>>>> > least squares via summary(model)
>>>>>> > SPARK-9681  Feature interactions in R formula - Interaction
>>>>>> operator ":" in
>>>>>> > R formula
>>>>>> >
>>>>>> > Python API - Many improvements to Python API to approach feature
>>>>>> parity
>>>>>> >
>>>>>> > Misc improvements
>>>>>> >
>>>>>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and
>>>>>> Linear
>>>>>> > Regression can take instance weights
>>>>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>>>>>> DataFrames -
>>>>>> > Variance, stddev, correlations, etc.
>>>>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>>>>>> >
>>>>>> > Documentation improvements
>>>>>> >
>>>>>> > SPARK-7751  @since versions - Documentation includes initial
>>>>>> version when
>>>>>> > classes and methods were added
>>>>>> > SPARK-11337 Testable example code - Automated testing for code in
>>>>>> user guide
>>>>>> > examples
>>>>>> >
>>>>>> > Deprecations
>>>>>> >
>>>>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>>>> deprecated.
>>>>>> > In spark.ml.classification.LogisticRegressionModel and
>>>>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has
>>>>>> been
>>>>>> > deprecated, in favor of the new name "coefficients." This helps
>>>>>> disambiguate
>>>>>> > from instance (row) weights given to algorithms.
>>>>>> >
>>>>>> > Changes of behavior
>>>>>> >
>>>>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>>>> semantics in
>>>>>> > 1.6. Previously, it was a threshold for absolute change in error.
>>>>>> Now, it
>>>>>> > resembles the behavior of GradientDescent convergenceTol: For large
>>>>>> errors,
>>>>>> > it uses relative error (relative to the previous error); for small
>>>>>> errors (<
>>>>>> > 0.01), it uses absolute error.
>>>>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>>>> strings to
>>>>>> > lowercase before tokenizing. Now, it converts to lowercase by
>>>>>> default, with
>>>>>> > an option not to. This matches the behavior of the simpler Tokenizer
>>>>>> > transformer.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to