Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Sean Owen Thu, 03 Dec 2015 14:00:17 -0800

Licenses and signature are all fine.

Docker integration tests consistently fail for me with Java 7 / Ubuntu
and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver"


*** RUN ABORTED ***
  java.lang.NoSuchMethodError:
org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder;
  at 
org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240)
  at 
org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115)
  at 
org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418)
  at 
org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88)
  at org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120)
  at org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117)
  at 
org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340)
  at org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726)
  at 
org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285)
  at 
org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126)

I also get this failure consistently:

DirectKafkaStreamSuite
- offset recovery *** FAILED ***
  recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]))))
was false Recovered ranges are not the same as the ones generated
(DirectKafkaStreamSuite.scala:301)

On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <[email protected]> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 5, 2015 at 21:00 UTC and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v1.6.0-rc1
> (bf525845cef159d2d4c9f4d64e158f037179b5c4)
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1165/
>
> The test repository (versioned as v1.6.0-rc1) for this release can be found
> at:
> https://repository.apache.org/content/repositories/orgapachespark-1164/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>
>
> =======================================
> == How can I help test this release? ==
> =======================================
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ================================================
> == What justifies a -1 vote for this release? ==
> ================================================
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already present
> in 1.5, minor regressions, or bugs related to new features will not block
> this release.
>
> ===============================================================
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===============================================================
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==================================================
> == Major changes to help you focus your testing ==
> ==================================================
>
> Spark SQL
>
> SPARK-10810 Session Management - The ability to create multiple isolated SQL
> Contexts that have their own configuration and default database.  This is
> turned on by default in the thrift server.
> SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that performs
> many operations on serialized binary data and code generation (i.e. Project
> Tungsten).
> SPARK-10000 Unified Memory Management - Shared memory for execution and
> caching instead of exclusive division of the regions.
> SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries
> over files of any supported format without registering a table.
> SPARK-11745 Reading non-standard JSON files - Added options to read
> non-standard JSON files (e.g. single-quotes, unquoted attributes)
> SPARK-10412 Per-operator Metics for SQL Execution - Display statistics on a
> per-operator basis for memory usage and spilled data size.
> SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest and
> unest arbitrary numbers of columns
> SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - Significant
> (up to 14x) speed up when caching data that contains complex types in
> DataFrames or SQL.
> SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>) will
> now execute using SortMergeJoin instead of computing a cartisian product.
> SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring
> query execution to occur using off-heap memory to avoid GC overhead
> SPARK-10978 Datasource API Avoid Double Filter - When implementing a
> datasource with filter pushdown, developers can now tell Spark SQL to avoid
> double evaluating a pushed-down filter.
> SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
> ordering schemes in In-memory table scan, and adding distributeBy and
> localSort to DF API
> SPARK-9858  Adaptive query execution - Initial support for automatically
> selecting the number of reducers for joins and aggregations.
>
> Spark Streaming
>
> API Updates
>
> SPARK-2629  New improved state management - trackStateByKey - a DStream
> transformation for stateful stream processing, supersedes updateStateByKey
> in functionality and performance.
> SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
> upgraded to use KCL 1.4.0 and supports transparent deaggregation of
> KPL-aggregated records.
> SPARK-10891 Kinesis message handler function - Allows arbitrary function to
> be applied to a Kinesis record in the Kinesis receiver before to customize
> what data is to be stored in memory.
> SPARK-6328  Python Streaming Listener API - Get streaming statistics
> (scheduling delays, batch processing times, etc.) in streaming.
>
> UI Improvements
>
> Made failures visible in the streaming tab, in the timelines, batch list,
> and batch details page.
> Made output operations visible in the streaming tab as progress bars
>
> MLlib
>
> New algorithms/models
>
> SPARK-8518  Survival analysis - Log-linear model for survival analysis
> SPARK-9834  Normal equation for least squares - Normal equation solver,
> providing R-like model summary statistics
> SPARK-3147  Online hypothesis testing - A/B testing in the Spark Streaming
> framework
> SPARK-9930  New feature transformers - ChiSqSelector, QuantileDiscretizer,
> SQL transformer
> SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering variant
> of K-Means
>
> API improvements
>
> ML Pipelines
>
> SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with partial
> coverage of spark.ml algorithms
> SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation in ML
> Pipelines
>
> R API
>
> SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for ordinary
> least squares via summary(model)
> SPARK-9681  Feature interactions in R formula - Interaction operator ":" in
> R formula
>
> Python API - Many improvements to Python API to approach feature parity
>
> Misc improvements
>
> SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and Linear
> Regression can take instance weights
> SPARK-10384, SPARK-10385 Univariate and bivariate statistics in DataFrames -
> Variance, stddev, correlations, etc.
> SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>
> Documentation improvements
>
> SPARK-7751  @since versions - Documentation includes initial version when
> classes and methods were added
> SPARK-11337 Testable example code - Automated testing for code in user guide
> examples
>
> Deprecations
>
> In spark.mllib.clustering.KMeans, the "runs" parameter has been deprecated.
> In spark.ml.classification.LogisticRegressionModel and
> spark.ml.regression.LinearRegressionModel, the "weights" field has been
> deprecated, in favor of the new name "coefficients." This helps disambiguate
> from instance (row) weights given to algorithms.
>
> Changes of behavior
>
> spark.mllib.tree.GradientBoostedTrees validationTol has changed semantics in
> 1.6. Previously, it was a threshold for absolute change in error. Now, it
> resembles the behavior of GradientDescent convergenceTol: For large errors,
> it uses relative error (relative to the previous error); for small errors (<
> 0.01), it uses absolute error.
> spark.ml.feature.RegexTokenizer: Previously, it did not convert strings to
> lowercase before tokenizing. Now, it converts to lowercase by default, with
> an option not to. This matches the behavior of the simpler Tokenizer
> transformer.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to