Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Michael Armbrust Sun, 27 Dec 2015 16:02:09 -0800

Thanks for testing and voting everyone.  The vote passes unanimously with
21 +1 votes and no -1 votes.  I start finalizing the release now.


+1
Michael Armbrust*
Reynold Xin*
Andrew Or*
Benjamin Fradet
Mark Hamstra*
Jeff Zhang
Josh Rosen*
Aaron Davidson*
Denny Lee
Yin Huai
Jean-Baptiste Onofré
Kousuke Saruta
Zsolt Tóth
Iulian Dragoș
Allen Zhang
Vinay Shukla
Vaquar Khan
Bhupendra Mishra
Krishna Sankar
Ricardo Almeida
Cheng Lian

-1:
none

On Sat, Dec 26, 2015 at 6:11 AM, Cheng Lian <lian.cs....@gmail.com> wrote:

> +1
>
> On 12/23/15 12:39 PM, Yin Huai wrote:
>
> +1
>
> On Tue, Dec 22, 2015 at 8:10 PM, Denny Lee <denny.g....@gmail.com> wrote:
>
>> +1
>>
>> On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson <ilike...@gmail.com>
>> wrote:
>>
>>> +1
>>>
>>> On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen < <joshro...@databricks.com>
>>> joshro...@databricks.com> wrote:
>>>
>>>> +1
>>>>
>>>> On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang < <zjf...@gmail.com>
>>>> zjf...@gmail.com> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra <
>>>>> <m...@clearstorydata.com>m...@clearstorydata.com> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <
>>>>>> <mich...@databricks.com>mich...@databricks.com> wrote:
>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 1.6.0!
>>>>>>>
>>>>>>> The vote is open until Friday, December 25, 2015 at 18:00 UTC and
>>>>>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>
>>>>>>> To learn more about Apache Spark, please see
>>>>>>> <http://spark.apache.org/>http://spark.apache.org/
>>>>>>>
>>>>>>> The tag to be voted on is *v1.6.0-rc4
>>>>>>> (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
>>>>>>> <https://github.com/apache/spark/tree/v1.6.0-rc4>*
>>>>>>>
>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>> at:
>>>>>>>
>>>>>>> <http://people.apache.org/%7Epwendell/spark-releases/spark-1.6.0-rc4-bin/>
>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>>>>>>>
>>>>>>> Release artifacts are signed with the following key:
>>>>>>> <https://people.apache.org/keys/committer/pwendell.asc>
>>>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>>>
>>>>>>> The staging repository for this release can be found at:
>>>>>>>
>>>>>>> <https://repository.apache.org/content/repositories/orgapachespark-1176/>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1176/
>>>>>>>
>>>>>>> The test repository (versioned as v1.6.0-rc4) for this release can
>>>>>>> be found at:
>>>>>>>
>>>>>>> <https://repository.apache.org/content/repositories/orgapachespark-1175/>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1175/
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>>
>>>>>>> <http://people.apache.org/%7Epwendell/spark-releases/spark-1.6.0-rc4-docs/>
>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>>>>>>>
>>>>>>> =======================================
>>>>>>> == How can I help test this release? ==
>>>>>>> =======================================
>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>> an existing Spark workload and running on this release candidate, then
>>>>>>> reporting any regressions.
>>>>>>>
>>>>>>> ================================================
>>>>>>> == What justifies a -1 vote for this release? ==
>>>>>>> ================================================
>>>>>>> This vote is happening towards the end of the 1.6 QA period, so -1
>>>>>>> votes should only occur for significant regressions from 1.5. Bugs 
>>>>>>> already
>>>>>>> present in 1.5, minor regressions, or bugs related to new features will 
>>>>>>> not
>>>>>>> block this release.
>>>>>>>
>>>>>>> ===============================================================
>>>>>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>>>>>> ===============================================================
>>>>>>> 1. It is OK for documentation patches to target 1.6.0 and still go
>>>>>>> into branch-1.6, since documentations will be published separately from 
>>>>>>> the
>>>>>>> release.
>>>>>>> 2. New features for non-alpha-modules should target 1.7+.
>>>>>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>>>>>> target version.
>>>>>>>
>>>>>>>
>>>>>>> ==================================================
>>>>>>> == Major changes to help you focus your testing ==
>>>>>>> ==================================================
>>>>>>>
>>>>>>> Notable changes since 1.6 RC3
>>>>>>>
>>>>>>>   - SPARK-12404 - Fix serialization error for Datasets with
>>>>>>> Timestamps/Arrays/Decimal
>>>>>>>   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>>>>>>>   - SPARK-12395 - Fix join columns of outer join for DataFrame using
>>>>>>>   - SPARK-12413 - Fix mesos HA
>>>>>>>
>>>>>>> Notable changes since 1.6 RC2
>>>>>>> - SPARK_VERSION has been set correctly
>>>>>>> - SPARK-12199 ML Docs are publishing correctly
>>>>>>> - SPARK-12345 Mesos cluster mode has been fixed
>>>>>>>
>>>>>>> Notable changes since 1.6 RC1
>>>>>>> Spark Streaming
>>>>>>>
>>>>>>>    - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629>
>>>>>>>    trackStateByKey has been renamed to mapWithState
>>>>>>>
>>>>>>> Spark SQL
>>>>>>>
>>>>>>>    - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165>
>>>>>>>     SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix
>>>>>>>    bugs in eviction of storage memory by execution.
>>>>>>>    - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> 
>>>>>>> correct
>>>>>>>    passing null into ScalaUDF
>>>>>>>
>>>>>>> Notable Features Since 1.5 Spark SQL
>>>>>>>
>>>>>>>    - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787>
>>>>>>>     Parquet Performance - Improve Parquet scan performance when
>>>>>>>    using flat schemas.
>>>>>>>    - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810>
>>>>>>>     Session Management - Isolated devault database (i.e USE mydb)
>>>>>>>    even on shared clusters.
>>>>>>>    - SPARK-9999  <https://issues.apache.org/jira/browse/SPARK-9999> 
>>>>>>> Dataset
>>>>>>>    API - A type-safe API (similar to RDDs) that performs many
>>>>>>>    operations on serialized binary data and code generation (i.e. 
>>>>>>> Project
>>>>>>>    Tungsten).
>>>>>>>    - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000>
>>>>>>>     Unified Memory Management - Shared memory for execution and
>>>>>>>    caching instead of exclusive division of the regions.
>>>>>>>    - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197>
>>>>>>>     SQL Queries on Files - Concise syntax for running SQL queries
>>>>>>>    over files of any supported format without registering a table.
>>>>>>>    - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745>
>>>>>>>     Reading non-standard JSON files - Added options to read
>>>>>>>    non-standard JSON files (e.g. single-quotes, unquoted attributes)
>>>>>>>    - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412>
>>>>>>>     Per-operator Metrics for SQL Execution - Display statistics on
>>>>>>>    a peroperator basis for memory usage and spilled data size.
>>>>>>>    - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329>
>>>>>>>     Star (*) expansion for StructTypes - Makes it easier to nest
>>>>>>>    and unest arbitrary numbers of columns
>>>>>>>    - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>
>>>>>>>    , SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149>
>>>>>>>     In-memory Columnar Cache Performance - Significant (up to 14x)
>>>>>>>    speed up when caching data that contains complex types in DataFrames 
>>>>>>> or SQL.
>>>>>>>    - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111>
>>>>>>>     Fast null-safe joins - Joins using null-safe equality (<=>)
>>>>>>>    will now execute using SortMergeJoin instead of computing a cartisian
>>>>>>>    product.
>>>>>>>    - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389>
>>>>>>>     SQL Execution Using Off-Heap Memory - Support for configuring
>>>>>>>    query execution to occur using off-heap memory to avoid GC overhead
>>>>>>>    - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978>
>>>>>>>     Datasource API Avoid Double Filter - When implemeting a
>>>>>>>    datasource with filter pushdown, developers can now tell Spark SQL 
>>>>>>> to avoid
>>>>>>>    double evaluating a pushed-down filter.
>>>>>>>    - SPARK-4849  <https://issues.apache.org/jira/browse/SPARK-4849> 
>>>>>>> Advanced
>>>>>>>    Layout of Cached Data - storing partitioning and ordering
>>>>>>>    schemes in In-memory table scan, and adding distributeBy and 
>>>>>>> localSort to
>>>>>>>    DF API
>>>>>>>    - SPARK-9858  <https://issues.apache.org/jira/browse/SPARK-9858> 
>>>>>>> Adaptive
>>>>>>>    query execution - Intial support for automatically selecting the
>>>>>>>    number of reducers for joins and aggregations.
>>>>>>>    - SPARK-9241  <https://issues.apache.org/jira/browse/SPARK-9241> 
>>>>>>> Improved
>>>>>>>    query planner for queries having distinct aggregations - Query
>>>>>>>    plans of distinct aggregations are more robust when distinct columns 
>>>>>>> have
>>>>>>>    high cardinality.
>>>>>>>
>>>>>>> Spark Streaming
>>>>>>>
>>>>>>>    - API Updates
>>>>>>>       - SPARK-2629
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-2629> New
>>>>>>>       improved state management - mapWithState - a DStream
>>>>>>>       transformation for stateful stream processing, supercedes
>>>>>>>       updateStateByKey in functionality and performance.
>>>>>>>       - SPARK-11198
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis
>>>>>>>       record deaggregation - Kinesis streams have been upgraded to
>>>>>>>       use KCL 1.4.0 and supports transparent deaggregation of 
>>>>>>> KPL-aggregated
>>>>>>>       records.
>>>>>>>       - SPARK-10891
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis
>>>>>>>       message handler function - Allows arbitraray function to be
>>>>>>>       applied to a Kinesis record in the Kinesis receiver before to 
>>>>>>> customize
>>>>>>>       what data is to be stored in memory.
>>>>>>>       - SPARK-6328
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-6328> Python
>>>>>>>       Streamng Listener API - Get streaming statistics (scheduling
>>>>>>>       delays, batch processing times, etc.) in streaming.
>>>>>>>
>>>>>>>
>>>>>>>    - UI Improvements
>>>>>>>       - Made failures visible in the streaming tab, in the
>>>>>>>       timelines, batch list, and batch details page.
>>>>>>>       - Made output operations visible in the streaming tab as
>>>>>>>       progress bars.
>>>>>>>
>>>>>>> MLlib New algorithms/models
>>>>>>>
>>>>>>>    - SPARK-8518  <https://issues.apache.org/jira/browse/SPARK-8518> 
>>>>>>> Survival
>>>>>>>    analysis - Log-linear model for survival analysis
>>>>>>>    - SPARK-9834  <https://issues.apache.org/jira/browse/SPARK-9834> 
>>>>>>> Normal
>>>>>>>    equation for least squares - Normal equation solver, providing
>>>>>>>    R-like model summary statistics
>>>>>>>    - SPARK-3147  <https://issues.apache.org/jira/browse/SPARK-3147> 
>>>>>>> Online
>>>>>>>    hypothesis testing - A/B testing in the Spark Streaming framework
>>>>>>>    - SPARK-9930  <https://issues.apache.org/jira/browse/SPARK-9930> New
>>>>>>>    feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>>>>>>>    transformer
>>>>>>>    - SPARK-6517  <https://issues.apache.org/jira/browse/SPARK-6517> 
>>>>>>> Bisecting
>>>>>>>    K-Means clustering - Fast top-down clustering variant of K-Means
>>>>>>>
>>>>>>> API improvements
>>>>>>>
>>>>>>>    - ML Pipelines
>>>>>>>       - SPARK-6725
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-6725> Pipeline
>>>>>>>       persistence - Save/load for ML Pipelines, with partial
>>>>>>>       coverage of spark.mlalgorithms
>>>>>>>       - SPARK-5565
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-5565> LDA in ML
>>>>>>>       Pipelines - API for Latent Dirichlet Allocation in ML
>>>>>>>       Pipelines
>>>>>>>    - R API
>>>>>>>       - SPARK-9836
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-9836> R-like
>>>>>>>       statistics for GLMs - (Partial) R-like stats for ordinary
>>>>>>>       least squares via summary(model)
>>>>>>>       - SPARK-9681
>>>>>>>       <https://issues.apache.org/jira/browse/SPARK-9681> Feature
>>>>>>>       interactions in R formula - Interaction operator ":" in R
>>>>>>>       formula
>>>>>>>    - Python API - Many improvements to Python API to approach
>>>>>>>    feature parity
>>>>>>>
>>>>>>> Misc improvements
>>>>>>>
>>>>>>>    - SPARK-7685  <https://issues.apache.org/jira/browse/SPARK-7685>
>>>>>>>    , SPARK-9642  <https://issues.apache.org/jira/browse/SPARK-9642> 
>>>>>>> Instance
>>>>>>>    weights for GLMs - Logistic and Linear Regression can take
>>>>>>>    instance weights
>>>>>>>    - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>
>>>>>>>    , SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385>
>>>>>>>     Univariate and bivariate statistics in DataFrames - Variance,
>>>>>>>    stddev, correlations, etc.
>>>>>>>    - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117>
>>>>>>>     LIBSVM data source - LIBSVM as a SQL data source Documentation
>>>>>>>    improvements
>>>>>>>    - SPARK-7751  <https://issues.apache.org/jira/browse/SPARK-7751> 
>>>>>>> @since
>>>>>>>    versions - Documentation includes initial version when classes
>>>>>>>    and methods were added
>>>>>>>    - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337>
>>>>>>>     Testable example code - Automated testing for code in user
>>>>>>>    guide examples
>>>>>>>
>>>>>>> Deprecations
>>>>>>>
>>>>>>>    - In spark.mllib.clustering.KMeans, the "runs" parameter has
>>>>>>>    been deprecated.
>>>>>>>    - In spark.ml.classification.LogisticRegressionModel and
>>>>>>>    spark.ml.regression.LinearRegressionModel, the "weights" field has 
>>>>>>> been
>>>>>>>    deprecated, in favor of the new name "coefficients." This helps
>>>>>>>    disambiguate from instance (row) weights given to algorithms.
>>>>>>>
>>>>>>> Changes of behavior
>>>>>>>
>>>>>>>    - spark.mllib.tree.GradientBoostedTrees validationTol has
>>>>>>>    changed semantics in 1.6. Previously, it was a threshold for absolute
>>>>>>>    change in error. Now, it resembles the behavior of GradientDescent
>>>>>>>    convergenceTol: For large errors, it uses relative error (relative 
>>>>>>> to the
>>>>>>>    previous error); for small errors (< 0.01), it uses absolute error.
>>>>>>>    - spark.ml.feature.RegexTokenizer: Previously, it did not
>>>>>>>    convert strings to lowercase before tokenizing. Now, it converts to
>>>>>>>    lowercase by default, with an option not to. This matches the 
>>>>>>> behavior of
>>>>>>>    the simpler Tokenizer transformer.
>>>>>>>    - Spark SQL's partition discovery has been changed to only
>>>>>>>    discover partition directories that are children of the given path. 
>>>>>>> (i.e.
>>>>>>>    if path="/my/data/x=1" then x=1 will no longer be considered a
>>>>>>>    partition but only children of x=1.) This behavior can be
>>>>>>>    overridden by manually specifying the basePath that partitioning
>>>>>>>    discovery should start with (SPARK-11678
>>>>>>>    <https://issues.apache.org/jira/browse/SPARK-11678>).
>>>>>>>    - When casting a value of an integral type to timestamp (e.g.
>>>>>>>    casting a long value to timestamp), the value is treated as being in
>>>>>>>    seconds instead of milliseconds (SPARK-11724
>>>>>>>    <https://issues.apache.org/jira/browse/SPARK-11724>).
>>>>>>>    - With the improved query planner for queries having distinct
>>>>>>>    aggregations (SPARK-9241
>>>>>>>    <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of
>>>>>>>    a query having a single distinct aggregation has been changed to a 
>>>>>>> more
>>>>>>>    robust version. To switch back to the plan generated by Spark 1.5's
>>>>>>>    planner, please set spark.sql.specializeSingleDistinctAggPlanning
>>>>>>>     to true (SPARK-12077
>>>>>>>    <https://issues.apache.org/jira/browse/SPARK-12077>).
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>
>>>>
>>>
>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Reply via email to