Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Cheng Lian Sat, 26 Dec 2015 06:11:58 -0800

+1

On 12/23/15 12:39 PM, Yin Huai wrote:

+1

On Tue, Dec 22, 2015 at 8:10 PM, Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> wrote:


    +1

    On Tue, Dec 22, 2015 at 7:05 PM Aaron Davidson <ilike...@gmail.com
    <mailto:ilike...@gmail.com>> wrote:

        +1

        On Tue, Dec 22, 2015 at 7:01 PM, Josh Rosen
        <joshro...@databricks.com <mailto:joshro...@databricks.com>>
        wrote:

            +1

            On Tue, Dec 22, 2015 at 7:00 PM, Jeff Zhang
            <zjf...@gmail.com <mailto:zjf...@gmail.com>> wrote:

                +1

                On Wed, Dec 23, 2015 at 7:36 AM, Mark Hamstra
                <m...@clearstorydata.com
                <mailto:m...@clearstorydata.com>> wrote:

                    +1

                    On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust
                    <mich...@databricks.com
                    <mailto:mich...@databricks.com>> wrote:

                        Please vote on releasing the following
                        candidate as Apache Spark version 1.6.0!

                        The vote is open until Friday, December 25,
                        2015 at 18:00 UTC and passes if a majority of
                        at least 3 +1 PMC votes are cast.

                        [ ] +1 Release this package as Apache Spark 1.6.0
                        [ ] -1 Do not release this package because ...

                        To learn more about Apache Spark, please see
                        http://spark.apache.org/

                        The tag to be voted on is _v1.6.0-rc4
                        (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
                        <https://github.com/apache/spark/tree/v1.6.0-rc4>_

                        The release files, including signatures,
                        digests, etc. can be found at:
                        
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
                        
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.6.0-rc4-bin/>

                        Release artifacts are signed with the
                        following key:
                        https://people.apache.org/keys/committer/pwendell.asc

                        The staging repository for this release can be
                        found at:
                        
https://repository.apache.org/content/repositories/orgapachespark-1176/

                        The test repository (versioned as v1.6.0-rc4)
                        for this release can be found at:
                        
https://repository.apache.org/content/repositories/orgapachespark-1175/

                        The documentation corresponding to this
                        release can be found at:
                        
http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
                        
<http://people.apache.org/%7Epwendell/spark-releases/spark-1.6.0-rc4-docs/>

                        =======================================
                        == How can I help test this release? ==
                        =======================================
                        If you are a Spark user, you can help us test
                        this release by taking an existing Spark
                        workload and running on this release
                        candidate, then reporting any regressions.

                        ================================================
                        == What justifies a -1 vote for this release? ==
                        ================================================
                        This vote is happening towards the end of the
                        1.6 QA period, so -1 votes should only occur
                        for significant regressions from 1.5. Bugs
                        already present in 1.5, minor regressions, or
                        bugs related to new features will not block
                        this release.

                        
===============================================================
                        == What should happen to JIRA tickets still
                        targeting 1.6.0? ==
                        
===============================================================
                        1. It is OK for documentation patches to
                        target 1.6.0 and still go into branch-1.6,
                        since documentations will be published
                        separately from the release.
                        2. New features for non-alpha-modules should
                        target 1.7+.
                        3. Non-blocker bug fixes should target 1.6.1
                        or 1.7.0, or drop the target version.


                        ==================================================
                        == Major changes to help you focus your testing ==
                        ==================================================


                          Notable changes since 1.6 RC3


                          - SPARK-12404 - Fix serialization error for
                        Datasets with Timestamps/Arrays/Decimal
                        - SPARK-12218 - Fix incorrect pushdown of
                        filters to parquet
                        - SPARK-12395 - Fix join columns of outer join
                        for DataFrame using
                        - SPARK-12413 - Fix mesos HA



                          Notable changes since 1.6 RC2


                        - SPARK_VERSION has been set correctly
                        - SPARK-12199 ML Docs are publishing correctly
                        - SPARK-12345 Mesos cluster mode has been fixed


                          Notable changes since 1.6 RC1


                              Spark Streaming

                          * SPARK-2629
                            <https://issues.apache.org/jira/browse/SPARK-2629>
                            |trackStateByKey| has been renamed to
                            |mapWithState|


                              Spark SQL

                          * SPARK-12165
                            <https://issues.apache.org/jira/browse/SPARK-12165>
                            SPARK-12189
                            <https://issues.apache.org/jira/browse/SPARK-12189> 
Fix
                            bugs in eviction of storage memory by
                            execution.
                          * SPARK-12258
                            <https://issues.apache.org/jira/browse/SPARK-12258> 
correct
                            passing null into ScalaUDF


                            Notable Features Since 1.5


                              Spark SQL

                          * SPARK-11787
                            <https://issues.apache.org/jira/browse/SPARK-11787>
                            Parquet Performance - Improve Parquet scan
                            performance when using flat schemas.
                          * SPARK-10810
                            
<https://issues.apache.org/jira/browse/SPARK-10810>Session
                            Management - Isolated devault database
                            (i.e |USE mydb|) even on shared clusters.
                          * SPARK-9999
                            <https://issues.apache.org/jira/browse/SPARK-9999>
                            Dataset API - A type-safe API (similar to
                            RDDs) that performs many operations on
                            serialized binary data and code generation
                            (i.e. Project Tungsten).
                          * SPARK-10000
                            <https://issues.apache.org/jira/browse/SPARK-10000>
                            Unified Memory Management - Shared memory
                            for execution and caching instead of
                            exclusive division of the regions.
                          * SPARK-11197
                            <https://issues.apache.org/jira/browse/SPARK-11197>
                            SQL Queries on Files - Concise syntax for
                            running SQL queries over files of any
                            supported format without registering a table.
                          * SPARK-11745
                            <https://issues.apache.org/jira/browse/SPARK-11745>
                            Reading non-standard JSON files - Added
                            options to read non-standard JSON files
                            (e.g. single-quotes, unquoted attributes)
                          * SPARK-10412
                            <https://issues.apache.org/jira/browse/SPARK-10412>
                            Per-operator Metrics for SQL Execution -
                            Display statistics on a peroperator basis
                            for memory usage and spilled data size.
                          * SPARK-11329
                            <https://issues.apache.org/jira/browse/SPARK-11329>
                            Star (*) expansion for StructTypes - Makes
                            it easier to nest and unest arbitrary
                            numbers of columns
                          * SPARK-10917
                            <https://issues.apache.org/jira/browse/SPARK-10917>,
                            SPARK-11149
                            <https://issues.apache.org/jira/browse/SPARK-11149>
                            In-memory Columnar Cache Performance -
                            Significant (up to 14x) speed up when
                            caching data that contains complex types
                            in DataFrames or SQL.
                          * SPARK-11111
                            <https://issues.apache.org/jira/browse/SPARK-11111>
                            Fast null-safe joins - Joins using
                            null-safe equality (|<=>|) will now
                            execute using SortMergeJoin instead of
                            computing a cartisian product.
                          * SPARK-11389
                            <https://issues.apache.org/jira/browse/SPARK-11389>
                            SQL Execution Using Off-Heap Memory -
                            Support for configuring query execution to
                            occur using off-heap memory to avoid GC
                            overhead
                          * SPARK-10978
                            <https://issues.apache.org/jira/browse/SPARK-10978>
                            Datasource API Avoid Double Filter - When
                            implemeting a datasource with filter
                            pushdown, developers can now tell Spark
                            SQL to avoid double evaluating a
                            pushed-down filter.
                          * SPARK-4849
                            <https://issues.apache.org/jira/browse/SPARK-4849>
                            Advanced Layout of Cached Data - storing
                            partitioning and ordering schemes in
                            In-memory table scan, and adding
                            distributeBy and localSort to DF API
                          * SPARK-9858
                            <https://issues.apache.org/jira/browse/SPARK-9858>
                            Adaptive query execution - Intial support
                            for automatically selecting the number of
                            reducers for joins and aggregations.
                          * SPARK-9241
                            <https://issues.apache.org/jira/browse/SPARK-9241>
                            Improved query planner for queries having
                            distinct aggregations - Query plans of
                            distinct aggregations are more robust when
                            distinct columns have high cardinality.


                              Spark Streaming

                          * API Updates
                              o SPARK-2629
                                
<https://issues.apache.org/jira/browse/SPARK-2629>
                                New improved state management -
                                |mapWithState| - a DStream
                                transformation for stateful stream
                                processing, supercedes
                                |updateStateByKey| in functionality
                                and performance.
                              o SPARK-11198
                                
<https://issues.apache.org/jira/browse/SPARK-11198>
                                Kinesis record deaggregation - Kinesis
                                streams have been upgraded to use KCL
                                1.4.0 and supports transparent
                                deaggregation of KPL-aggregated records.
                              o SPARK-10891
                                
<https://issues.apache.org/jira/browse/SPARK-10891>
                                Kinesis message handler function -
                                Allows arbitraray function to be
                                applied to a Kinesis record in the
                                Kinesis receiver before to customize
                                what data is to be stored in memory.
                              o SPARK-6328
                                
<https://issues.apache.org/jira/browse/SPARK-6328>
                                Python Streamng Listener API - Get
                                streaming statistics (scheduling
                                delays, batch processing times, etc.)
                                in streaming.

                          * UI Improvements
                              o Made failures visible in the streaming
                                tab, in the timelines, batch list, and
                                batch details page.
                              o Made output operations visible in the
                                streaming tab as progress bars.


                              MLlib


                                New algorithms/models

                          * SPARK-8518
                            <https://issues.apache.org/jira/browse/SPARK-8518>
                            Survival analysis - Log-linear model for
                            survival analysis
                          * SPARK-9834
                            <https://issues.apache.org/jira/browse/SPARK-9834>
                            Normal equation for least squares - Normal
                            equation solver, providing R-like model
                            summary statistics
                          * SPARK-3147
                            <https://issues.apache.org/jira/browse/SPARK-3147>
                            Online hypothesis testing - A/B testing in
                            the Spark Streaming framework
                          * SPARK-9930
                            <https://issues.apache.org/jira/browse/SPARK-9930>
                            New feature transformers - ChiSqSelector,
                            QuantileDiscretizer, SQL transformer
                          * SPARK-6517
                            <https://issues.apache.org/jira/browse/SPARK-6517>
                            Bisecting K-Means clustering - Fast
                            top-down clustering variant of K-Means


                                API improvements

                          * ML Pipelines
                              o SPARK-6725
                                
<https://issues.apache.org/jira/browse/SPARK-6725>
                                Pipeline persistence - Save/load for
                                ML Pipelines, with partial coverage of
                                spark.ml <http://spark.ml/>algorithms
                              o SPARK-5565
                                
<https://issues.apache.org/jira/browse/SPARK-5565>
                                LDA in ML Pipelines - API for Latent
                                Dirichlet Allocation in ML Pipelines
                          * R API
                              o SPARK-9836
                                
<https://issues.apache.org/jira/browse/SPARK-9836>
                                R-like statistics for GLMs - (Partial)
                                R-like stats for ordinary least
                                squares via summary(model)
                              o SPARK-9681
                                
<https://issues.apache.org/jira/browse/SPARK-9681>
                                Feature interactions in R formula -
                                Interaction operator ":" in R formula
                          * Python API - Many improvements to Python
                            API to approach feature parity


                                Misc improvements

                          * SPARK-7685
                            <https://issues.apache.org/jira/browse/SPARK-7685>,
                            SPARK-9642
                            <https://issues.apache.org/jira/browse/SPARK-9642>
                            Instance weights for GLMs - Logistic and
                            Linear Regression can take instance weights
                          * SPARK-10384
                            <https://issues.apache.org/jira/browse/SPARK-10384>,
                            SPARK-10385
                            <https://issues.apache.org/jira/browse/SPARK-10385>
                            Univariate and bivariate statistics in
                            DataFrames - Variance, stddev,
                            correlations, etc.
                          * SPARK-10117
                            <https://issues.apache.org/jira/browse/SPARK-10117>
                            LIBSVM data source - LIBSVM as a SQL data
                            source


                                    Documentation improvements

                          * SPARK-7751
                            <https://issues.apache.org/jira/browse/SPARK-7751>
                            @since versions - Documentation includes
                            initial version when classes and methods
                            were added
                          * SPARK-11337
                            <https://issues.apache.org/jira/browse/SPARK-11337>
                            Testable example code - Automated testing
                            for code in user guide examples


                            Deprecations

                          * In spark.mllib.clustering.KMeans, the
                            "runs" parameter has been deprecated.
                          * In
                            spark.ml.classification.LogisticRegressionModel
                            and
                            spark.ml.regression.LinearRegressionModel,
                            the "weights" field has been deprecated,
                            in favor of the new name "coefficients."
                            This helps disambiguate from instance
                            (row) weights given to algorithms.


                            Changes of behavior

                          * spark.mllib.tree.GradientBoostedTrees
                            validationTol has changed semantics in
                            1.6. Previously, it was a threshold for
                            absolute change in error. Now, it
                            resembles the behavior of GradientDescent
                            convergenceTol: For large errors, it uses
                            relative error (relative to the previous
                            error); for small errors (< 0.01), it uses
                            absolute error.
                          * spark.ml.feature.RegexTokenizer:
                            Previously, it did not convert strings to
                            lowercase before tokenizing. Now, it
                            converts to lowercase by default, with an
                            option not to. This matches the behavior
                            of the simpler Tokenizer transformer.
                          * Spark SQL's partition discovery has been
                            changed to only discover partition
                            directories that are children of the given
                            path. (i.e. if |path="/my/data/x=1"| then
                            |x=1| will no longer be considered a
                            partition but only children of |x=1|.)
                            This behavior can be overridden by
                            manually specifying the |basePath| that
                            partitioning discovery should start with
                            (SPARK-11678
                            
<https://issues.apache.org/jira/browse/SPARK-11678>).
                          * When casting a value of an integral type
                            to timestamp (e.g. casting a long value to
                            timestamp), the value is treated as being
                            in seconds instead of milliseconds
                            (SPARK-11724
                            
<https://issues.apache.org/jira/browse/SPARK-11724>).
                          * With the improved query planner for
                            queries having distinct aggregations
                            (SPARK-9241
                            <https://issues.apache.org/jira/browse/SPARK-9241>),
                            the plan of a query having a single
                            distinct aggregation has been changed to a
                            more robust version. To switch back to the
                            plan generated by Spark 1.5's planner,
                            please set
                            |spark.sql.specializeSingleDistinctAggPlanning| to
                            |true| (SPARK-12077
                            
<https://issues.apache.org/jira/browse/SPARK-12077>).

--Best Regards


                Jeff Zhang

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Reply via email to