Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Michael Armbrust Thu, 10 Dec 2015 13:00:18 -0800

We are getting close to merging patches for SPARK-12155
<https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253
<https://issues.apache.org/jira/browse/SPARK-12253>.  I'll be cutting RC2
shortly after that.


Michael

On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust <[email protected]>
wrote:

> An update: the vote fails due to the -1.   I'll post another RC as soon as
> we've resolved these issues.  In the mean time I encourage people to
> continue testing and post any problems they encounter here.
>
> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai <[email protected]> wrote:
>
>> -1
>>
>> Tow blocker bugs have been found after this RC.
>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data
>> corruption when an external sorter spills data.
>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks from
>> acquiring memory even when the executor indeed can allocate memory by
>> evicting storage memory.
>>
>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We are
>> still working on https://issues.apache.org/jira/browse/SPARK-12155.
>>
>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra <[email protected]>
>> wrote:
>>
>>> 0
>>>
>>> Currently figuring out who is responsible for the regression that I am
>>> seeing in some user code ScalaUDFs that make use of Timestamps and where
>>> NULL from a CSV file read in via a TestHive#registerTestTable is now
>>> producing 1969-12-31 23:59:59.999999 instead of null.
>>>
>>> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <[email protected]> wrote:
>>>
>>>> Licenses and signature are all fine.
>>>>
>>>> Docker integration tests consistently fail for me with Java 7 / Ubuntu
>>>> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"
>>>>
>>>> *** RUN ABORTED ***
>>>>   java.lang.NoSuchMethodError:
>>>>
>>>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
>>>>   at
>>>> org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240)
>>>>   at
>>>> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
>>>>   at
>>>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
>>>>   at
>>>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
>>>>   at
>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
>>>>   at
>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
>>>>   at
>>>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
>>>>   at
>>>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
>>>>   at
>>>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
>>>>   at
>>>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)
>>>>
>>>> I also get this failure consistently:
>>>>
>>>> DirectKafkaStreamSuite
>>>> - offset recovery *** FAILED ***
>>>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>>>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>>>
>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>>
>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>>
>>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]))))
>>>> was false Recovered ranges are not the same as the ones generated
>>>> (DirectKafkaStreamSuite.scala:301)
>>>>
>>>> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <
>>>> [email protected]> wrote:
>>>> > Please vote on releasing the following candidate as Apache Spark
>>>> version
>>>> > 1.6.0!
>>>> >
>>>> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and
>>>> passes if
>>>> > a majority of at least 3 +1 PMC votes are cast.
>>>> >
>>>> > [ ] +1 Release this package as Apache Spark 1.6.0
>>>> > [ ] -1 Do not release this package because ...
>>>> >
>>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>>> >
>>>> > The tag to be voted on is v1.6.0-rc1
>>>> > (bf525845cef159d2d4c9f4d64e158f037179b5c4)
>>>> >
>>>> > The release files, including signatures, digests, etc. can be found
>>>> at:
>>>> >
>>>> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>>>> >
>>>> > Release artifacts are signed with the following key:
>>>> > https://people.apache.org/keys/committer/pwendell.asc
>>>> >
>>>> > The staging repository for this release can be found at:
>>>> >
>>>> https://repository.apache.org/content/repositories/orgapachespark-1165/
>>>> >
>>>> > The test repository (versioned as v1.6.0-rc1) for this release can be
>>>> found
>>>> > at:
>>>> >
>>>> https://repository.apache.org/content/repositories/orgapachespark-1164/
>>>> >
>>>> > The documentation corresponding to this release can be found at:
>>>> >
>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>>>> >
>>>> >
>>>> > =======================================
>>>> > == How can I help test this release? ==
>>>> > =======================================
>>>> > If you are a Spark user, you can help us test this release by taking
>>>> an
>>>> > existing Spark workload and running on this release candidate, then
>>>> > reporting any regressions.
>>>> >
>>>> > ================================================
>>>> > == What justifies a -1 vote for this release? ==
>>>> > ================================================
>>>> > This vote is happening towards the end of the 1.6 QA period, so -1
>>>> votes
>>>> > should only occur for significant regressions from 1.5. Bugs already
>>>> present
>>>> > in 1.5, minor regressions, or bugs related to new features will not
>>>> block
>>>> > this release.
>>>> >
>>>> > ===============================================================
>>>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>>>> > ===============================================================
>>>> > 1. It is OK for documentation patches to target 1.6.0 and still go
>>>> into
>>>> > branch-1.6, since documentations will be published separately from the
>>>> > release.
>>>> > 2. New features for non-alpha-modules should target 1.7+.
>>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>>> target
>>>> > version.
>>>> >
>>>> >
>>>> > ==================================================
>>>> > == Major changes to help you focus your testing ==
>>>> > ==================================================
>>>> >
>>>> > Spark SQL
>>>> >
>>>> > SPARK-10810 Session Management - The ability to create multiple
>>>> isolated SQL
>>>> > Contexts that have their own configuration and default database.
>>>> This is
>>>> > turned on by default in the thrift server.
>>>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>>>> performs
>>>> > many operations on serialized binary data and code generation (i.e.
>>>> Project
>>>> > Tungsten).
>>>> > SPARK-10000 Unified Memory Management - Shared memory for execution
>>>> and
>>>> > caching instead of exclusive division of the regions.
>>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>>>> queries
>>>> > over files of any supported format without registering a table.
>>>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>>>> > SPARK-10412 Per-operator Metics for SQL Execution - Display
>>>> statistics on a
>>>> > per-operator basis for memory usage and spilled data size.
>>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
>>>> nest and
>>>> > unest arbitrary numbers of columns
>>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>>>> Significant
>>>> > (up to 14x) speed up when caching data that contains complex types in
>>>> > DataFrames or SQL.
>>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality
>>>> (<=>) will
>>>> > now execute using SortMergeJoin instead of computing a cartisian
>>>> product.
>>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>>>> configuring
>>>> > query execution to occur using off-heap memory to avoid GC overhead
>>>> > SPARK-10978 Datasource API Avoid Double Filter - When implementing a
>>>> > datasource with filter pushdown, developers can now tell Spark SQL to
>>>> avoid
>>>> > double evaluating a pushed-down filter.
>>>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
>>>> > ordering schemes in In-memory table scan, and adding distributeBy and
>>>> > localSort to DF API
>>>> > SPARK-9858  Adaptive query execution - Initial support for
>>>> automatically
>>>> > selecting the number of reducers for joins and aggregations.
>>>> >
>>>> > Spark Streaming
>>>> >
>>>> > API Updates
>>>> >
>>>> > SPARK-2629  New improved state management - trackStateByKey - a
>>>> DStream
>>>> > transformation for stateful stream processing, supersedes
>>>> updateStateByKey
>>>> > in functionality and performance.
>>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>>>> > KPL-aggregated records.
>>>> > SPARK-10891 Kinesis message handler function - Allows arbitrary
>>>> function to
>>>> > be applied to a Kinesis record in the Kinesis receiver before to
>>>> customize
>>>> > what data is to be stored in memory.
>>>> > SPARK-6328  Python Streaming Listener API - Get streaming statistics
>>>> > (scheduling delays, batch processing times, etc.) in streaming.
>>>> >
>>>> > UI Improvements
>>>> >
>>>> > Made failures visible in the streaming tab, in the timelines, batch
>>>> list,
>>>> > and batch details page.
>>>> > Made output operations visible in the streaming tab as progress bars
>>>> >
>>>> > MLlib
>>>> >
>>>> > New algorithms/models
>>>> >
>>>> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
>>>> > SPARK-9834  Normal equation for least squares - Normal equation
>>>> solver,
>>>> > providing R-like model summary statistics
>>>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>>>> Streaming
>>>> > framework
>>>> > SPARK-9930  New feature transformers - ChiSqSelector,
>>>> QuantileDiscretizer,
>>>> > SQL transformer
>>>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>>>> variant
>>>> > of K-Means
>>>> >
>>>> > API improvements
>>>> >
>>>> > ML Pipelines
>>>> >
>>>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>>>> partial
>>>> > coverage of spark.ml algorithms
>>>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation
>>>> in ML
>>>> > Pipelines
>>>> >
>>>> > R API
>>>> >
>>>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>>>> ordinary
>>>> > least squares via summary(model)
>>>> > SPARK-9681  Feature interactions in R formula - Interaction operator
>>>> ":" in
>>>> > R formula
>>>> >
>>>> > Python API - Many improvements to Python API to approach feature
>>>> parity
>>>> >
>>>> > Misc improvements
>>>> >
>>>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and
>>>> Linear
>>>> > Regression can take instance weights
>>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>>>> DataFrames -
>>>> > Variance, stddev, correlations, etc.
>>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>>>> >
>>>> > Documentation improvements
>>>> >
>>>> > SPARK-7751  @since versions - Documentation includes initial version
>>>> when
>>>> > classes and methods were added
>>>> > SPARK-11337 Testable example code - Automated testing for code in
>>>> user guide
>>>> > examples
>>>> >
>>>> > Deprecations
>>>> >
>>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>> deprecated.
>>>> > In spark.ml.classification.LogisticRegressionModel and
>>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has
>>>> been
>>>> > deprecated, in favor of the new name "coefficients." This helps
>>>> disambiguate
>>>> > from instance (row) weights given to algorithms.
>>>> >
>>>> > Changes of behavior
>>>> >
>>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>> semantics in
>>>> > 1.6. Previously, it was a threshold for absolute change in error.
>>>> Now, it
>>>> > resembles the behavior of GradientDescent convergenceTol: For large
>>>> errors,
>>>> > it uses relative error (relative to the previous error); for small
>>>> errors (<
>>>> > 0.01), it uses absolute error.
>>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>> strings to
>>>> > lowercase before tokenizing. Now, it converts to lowercase by
>>>> default, with
>>>> > an option not to. This matches the behavior of the simpler Tokenizer
>>>> > transformer.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to