Thanks for testing and voting everyone. The vote passes unanimously with 21 +1 votes and no -1 votes. I start finalizing the release now.
+1 Michael Armbrust* Reynold Xin* Andrew Or* Benjamin Fradet Mark Hamstra* Jeff Zhang Josh Rosen* Aaron Davidson* Denny Lee Yin Huai Jean-Baptiste Onofré Kousuke Saruta Zsolt Tóth Iulian Dragoș Allen Zhang Vinay Shukla Vaquar Khan Bhupendra Mishra Krishna Sankar Ricardo Almeida Cheng Lian -1: none On Sat, Dec 26, 2015 at 6:11 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > +1 > > On 12/23/15 12:39 PM, Yin Huai wrote: > > +1 > > On Tue, Dec 22, 2015 at 8:10 PM, Denny Lee <denny.g....@gmail.com> wrote: > >> +1 >> >> On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson <ilike...@gmail.com> >> wrote: >> >>> +1 >>> >>> On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen < <joshro...@databricks.com> >>> joshro...@databricks.com> wrote: >>> >>>> +1 >>>> >>>> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang < <zjf...@gmail.com> >>>> zjf...@gmail.com> wrote: >>>> >>>>> +1 >>>>> >>>>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra < >>>>> <m...@clearstorydata.com>m...@clearstorydata.com> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust < >>>>>> <mich...@databricks.com>mich...@databricks.com> wrote: >>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version 1.6.0! >>>>>>> >>>>>>> The vote is open until Friday, December 25, 2015 at 18:00 UTC and >>>>>>> passes if a majority of at least 3 +1 PMC votes are cast. >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 1.6.0 >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see >>>>>>> <http://spark.apache.org/>http://spark.apache.org/ >>>>>>> >>>>>>> The tag to be voted on is *v1.6.0-rc4 >>>>>>> (4062cda3087ae42c6c3cb24508fc1d3a931accdf) >>>>>>> <https://github.com/apache/spark/tree/v1.6.0-rc4>* >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>> at: >>>>>>> >>>>>>> <http://people.apache.org/%7Epwendell/spark-releases/spark-1.6.0-rc4-bin/> >>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/ >>>>>>> >>>>>>> Release artifacts are signed with the following key: >>>>>>> <https://people.apache.org/keys/committer/pwendell.asc> >>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>> >>>>>>> The staging repository for this release can be found at: >>>>>>> >>>>>>> <https://repository.apache.org/content/repositories/orgapachespark-1176/> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1176/ >>>>>>> >>>>>>> The test repository (versioned as v1.6.0-rc4) for this release can >>>>>>> be found at: >>>>>>> >>>>>>> <https://repository.apache.org/content/repositories/orgapachespark-1175/> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1175/ >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> >>>>>>> <http://people.apache.org/%7Epwendell/spark-releases/spark-1.6.0-rc4-docs/> >>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/ >>>>>>> >>>>>>> ======================================= >>>>>>> == How can I help test this release? == >>>>>>> ======================================= >>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>> an existing Spark workload and running on this release candidate, then >>>>>>> reporting any regressions. >>>>>>> >>>>>>> ================================================ >>>>>>> == What justifies a -1 vote for this release? == >>>>>>> ================================================ >>>>>>> This vote is happening towards the end of the 1.6 QA period, so -1 >>>>>>> votes should only occur for significant regressions from 1.5. Bugs >>>>>>> already >>>>>>> present in 1.5, minor regressions, or bugs related to new features will >>>>>>> not >>>>>>> block this release. >>>>>>> >>>>>>> =============================================================== >>>>>>> == What should happen to JIRA tickets still targeting 1.6.0? == >>>>>>> =============================================================== >>>>>>> 1. It is OK for documentation patches to target 1.6.0 and still go >>>>>>> into branch-1.6, since documentations will be published separately from >>>>>>> the >>>>>>> release. >>>>>>> 2. New features for non-alpha-modules should target 1.7+. >>>>>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the >>>>>>> target version. >>>>>>> >>>>>>> >>>>>>> ================================================== >>>>>>> == Major changes to help you focus your testing == >>>>>>> ================================================== >>>>>>> >>>>>>> Notable changes since 1.6 RC3 >>>>>>> >>>>>>> - SPARK-12404 - Fix serialization error for Datasets with >>>>>>> Timestamps/Arrays/Decimal >>>>>>> - SPARK-12218 - Fix incorrect pushdown of filters to parquet >>>>>>> - SPARK-12395 - Fix join columns of outer join for DataFrame using >>>>>>> - SPARK-12413 - Fix mesos HA >>>>>>> >>>>>>> Notable changes since 1.6 RC2 >>>>>>> - SPARK_VERSION has been set correctly >>>>>>> - SPARK-12199 ML Docs are publishing correctly >>>>>>> - SPARK-12345 Mesos cluster mode has been fixed >>>>>>> >>>>>>> Notable changes since 1.6 RC1 >>>>>>> Spark Streaming >>>>>>> >>>>>>> - SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> >>>>>>> trackStateByKey has been renamed to mapWithState >>>>>>> >>>>>>> Spark SQL >>>>>>> >>>>>>> - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165> >>>>>>> SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix >>>>>>> bugs in eviction of storage memory by execution. >>>>>>> - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> >>>>>>> correct >>>>>>> passing null into ScalaUDF >>>>>>> >>>>>>> Notable Features Since 1.5 Spark SQL >>>>>>> >>>>>>> - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> >>>>>>> Parquet Performance - Improve Parquet scan performance when >>>>>>> using flat schemas. >>>>>>> - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810> >>>>>>> Session Management - Isolated devault database (i.e USE mydb) >>>>>>> even on shared clusters. >>>>>>> - SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999> >>>>>>> Dataset >>>>>>> API - A type-safe API (similar to RDDs) that performs many >>>>>>> operations on serialized binary data and code generation (i.e. >>>>>>> Project >>>>>>> Tungsten). >>>>>>> - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> >>>>>>> Unified Memory Management - Shared memory for execution and >>>>>>> caching instead of exclusive division of the regions. >>>>>>> - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> >>>>>>> SQL Queries on Files - Concise syntax for running SQL queries >>>>>>> over files of any supported format without registering a table. >>>>>>> - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> >>>>>>> Reading non-standard JSON files - Added options to read >>>>>>> non-standard JSON files (e.g. single-quotes, unquoted attributes) >>>>>>> - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> >>>>>>> Per-operator Metrics for SQL Execution - Display statistics on >>>>>>> a peroperator basis for memory usage and spilled data size. >>>>>>> - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> >>>>>>> Star (*) expansion for StructTypes - Makes it easier to nest >>>>>>> and unest arbitrary numbers of columns >>>>>>> - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917> >>>>>>> , SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> >>>>>>> In-memory Columnar Cache Performance - Significant (up to 14x) >>>>>>> speed up when caching data that contains complex types in DataFrames >>>>>>> or SQL. >>>>>>> - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> >>>>>>> Fast null-safe joins - Joins using null-safe equality (<=>) >>>>>>> will now execute using SortMergeJoin instead of computing a cartisian >>>>>>> product. >>>>>>> - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> >>>>>>> SQL Execution Using Off-Heap Memory - Support for configuring >>>>>>> query execution to occur using off-heap memory to avoid GC overhead >>>>>>> - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> >>>>>>> Datasource API Avoid Double Filter - When implemeting a >>>>>>> datasource with filter pushdown, developers can now tell Spark SQL >>>>>>> to avoid >>>>>>> double evaluating a pushed-down filter. >>>>>>> - SPARK-4849 <https://issues.apache.org/jira/browse/SPARK-4849> >>>>>>> Advanced >>>>>>> Layout of Cached Data - storing partitioning and ordering >>>>>>> schemes in In-memory table scan, and adding distributeBy and >>>>>>> localSort to >>>>>>> DF API >>>>>>> - SPARK-9858 <https://issues.apache.org/jira/browse/SPARK-9858> >>>>>>> Adaptive >>>>>>> query execution - Intial support for automatically selecting the >>>>>>> number of reducers for joins and aggregations. >>>>>>> - SPARK-9241 <https://issues.apache.org/jira/browse/SPARK-9241> >>>>>>> Improved >>>>>>> query planner for queries having distinct aggregations - Query >>>>>>> plans of distinct aggregations are more robust when distinct columns >>>>>>> have >>>>>>> high cardinality. >>>>>>> >>>>>>> Spark Streaming >>>>>>> >>>>>>> - API Updates >>>>>>> - SPARK-2629 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-2629> New >>>>>>> improved state management - mapWithState - a DStream >>>>>>> transformation for stateful stream processing, supercedes >>>>>>> updateStateByKey in functionality and performance. >>>>>>> - SPARK-11198 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis >>>>>>> record deaggregation - Kinesis streams have been upgraded to >>>>>>> use KCL 1.4.0 and supports transparent deaggregation of >>>>>>> KPL-aggregated >>>>>>> records. >>>>>>> - SPARK-10891 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis >>>>>>> message handler function - Allows arbitraray function to be >>>>>>> applied to a Kinesis record in the Kinesis receiver before to >>>>>>> customize >>>>>>> what data is to be stored in memory. >>>>>>> - SPARK-6328 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-6328> Python >>>>>>> Streamng Listener API - Get streaming statistics (scheduling >>>>>>> delays, batch processing times, etc.) in streaming. >>>>>>> >>>>>>> >>>>>>> - UI Improvements >>>>>>> - Made failures visible in the streaming tab, in the >>>>>>> timelines, batch list, and batch details page. >>>>>>> - Made output operations visible in the streaming tab as >>>>>>> progress bars. >>>>>>> >>>>>>> MLlib New algorithms/models >>>>>>> >>>>>>> - SPARK-8518 <https://issues.apache.org/jira/browse/SPARK-8518> >>>>>>> Survival >>>>>>> analysis - Log-linear model for survival analysis >>>>>>> - SPARK-9834 <https://issues.apache.org/jira/browse/SPARK-9834> >>>>>>> Normal >>>>>>> equation for least squares - Normal equation solver, providing >>>>>>> R-like model summary statistics >>>>>>> - SPARK-3147 <https://issues.apache.org/jira/browse/SPARK-3147> >>>>>>> Online >>>>>>> hypothesis testing - A/B testing in the Spark Streaming framework >>>>>>> - SPARK-9930 <https://issues.apache.org/jira/browse/SPARK-9930> New >>>>>>> feature transformers - ChiSqSelector, QuantileDiscretizer, SQL >>>>>>> transformer >>>>>>> - SPARK-6517 <https://issues.apache.org/jira/browse/SPARK-6517> >>>>>>> Bisecting >>>>>>> K-Means clustering - Fast top-down clustering variant of K-Means >>>>>>> >>>>>>> API improvements >>>>>>> >>>>>>> - ML Pipelines >>>>>>> - SPARK-6725 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-6725> Pipeline >>>>>>> persistence - Save/load for ML Pipelines, with partial >>>>>>> coverage of spark.mlalgorithms >>>>>>> - SPARK-5565 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-5565> LDA in ML >>>>>>> Pipelines - API for Latent Dirichlet Allocation in ML >>>>>>> Pipelines >>>>>>> - R API >>>>>>> - SPARK-9836 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-9836> R-like >>>>>>> statistics for GLMs - (Partial) R-like stats for ordinary >>>>>>> least squares via summary(model) >>>>>>> - SPARK-9681 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-9681> Feature >>>>>>> interactions in R formula - Interaction operator ":" in R >>>>>>> formula >>>>>>> - Python API - Many improvements to Python API to approach >>>>>>> feature parity >>>>>>> >>>>>>> Misc improvements >>>>>>> >>>>>>> - SPARK-7685 <https://issues.apache.org/jira/browse/SPARK-7685> >>>>>>> , SPARK-9642 <https://issues.apache.org/jira/browse/SPARK-9642> >>>>>>> Instance >>>>>>> weights for GLMs - Logistic and Linear Regression can take >>>>>>> instance weights >>>>>>> - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384> >>>>>>> , SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> >>>>>>> Univariate and bivariate statistics in DataFrames - Variance, >>>>>>> stddev, correlations, etc. >>>>>>> - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> >>>>>>> LIBSVM data source - LIBSVM as a SQL data source Documentation >>>>>>> improvements >>>>>>> - SPARK-7751 <https://issues.apache.org/jira/browse/SPARK-7751> >>>>>>> @since >>>>>>> versions - Documentation includes initial version when classes >>>>>>> and methods were added >>>>>>> - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> >>>>>>> Testable example code - Automated testing for code in user >>>>>>> guide examples >>>>>>> >>>>>>> Deprecations >>>>>>> >>>>>>> - In spark.mllib.clustering.KMeans, the "runs" parameter has >>>>>>> been deprecated. >>>>>>> - In spark.ml.classification.LogisticRegressionModel and >>>>>>> spark.ml.regression.LinearRegressionModel, the "weights" field has >>>>>>> been >>>>>>> deprecated, in favor of the new name "coefficients." This helps >>>>>>> disambiguate from instance (row) weights given to algorithms. >>>>>>> >>>>>>> Changes of behavior >>>>>>> >>>>>>> - spark.mllib.tree.GradientBoostedTrees validationTol has >>>>>>> changed semantics in 1.6. Previously, it was a threshold for absolute >>>>>>> change in error. Now, it resembles the behavior of GradientDescent >>>>>>> convergenceTol: For large errors, it uses relative error (relative >>>>>>> to the >>>>>>> previous error); for small errors (< 0.01), it uses absolute error. >>>>>>> - spark.ml.feature.RegexTokenizer: Previously, it did not >>>>>>> convert strings to lowercase before tokenizing. Now, it converts to >>>>>>> lowercase by default, with an option not to. This matches the >>>>>>> behavior of >>>>>>> the simpler Tokenizer transformer. >>>>>>> - Spark SQL's partition discovery has been changed to only >>>>>>> discover partition directories that are children of the given path. >>>>>>> (i.e. >>>>>>> if path="/my/data/x=1" then x=1 will no longer be considered a >>>>>>> partition but only children of x=1.) This behavior can be >>>>>>> overridden by manually specifying the basePath that partitioning >>>>>>> discovery should start with (SPARK-11678 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-11678>). >>>>>>> - When casting a value of an integral type to timestamp (e.g. >>>>>>> casting a long value to timestamp), the value is treated as being in >>>>>>> seconds instead of milliseconds (SPARK-11724 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-11724>). >>>>>>> - With the improved query planner for queries having distinct >>>>>>> aggregations (SPARK-9241 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of >>>>>>> a query having a single distinct aggregation has been changed to a >>>>>>> more >>>>>>> robust version. To switch back to the plan generated by Spark 1.5's >>>>>>> planner, please set spark.sql.specializeSingleDistinctAggPlanning >>>>>>> to true (SPARK-12077 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-12077>). >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> >>>> >>> > >