Trying again now that eec36607 <https://github.com/apache/spark/commit/eec36607f9fc92b6c4d306e3930fcf03961625eb> is merged.
On Thu, Dec 10, 2015 at 6:44 PM, Michael Armbrust <mich...@databricks.com> wrote: > Cutting RC2 now. > > On Thu, Dec 10, 2015 at 12:59 PM, Michael Armbrust <mich...@databricks.com > > wrote: > >> We are getting close to merging patches for SPARK-12155 >> <https://issues.apache.org/jira/browse/SPARK-12155> and SPARK-12253 >> <https://issues.apache.org/jira/browse/SPARK-12253>. I'll be cutting >> RC2 shortly after that. >> >> Michael >> >> On Tue, Dec 8, 2015 at 10:31 AM, Michael Armbrust <mich...@databricks.com >> > wrote: >> >>> An update: the vote fails due to the -1. I'll post another RC as soon >>> as we've resolved these issues. In the mean time I encourage people to >>> continue testing and post any problems they encounter here. >>> >>> On Sun, Dec 6, 2015 at 6:24 PM, Yin Huai <yh...@databricks.com> wrote: >>> >>>> -1 >>>> >>>> Tow blocker bugs have been found after this RC. >>>> https://issues.apache.org/jira/browse/SPARK-12089 can cause data >>>> corruption when an external sorter spills data. >>>> https://issues.apache.org/jira/browse/SPARK-12155 can prevent tasks >>>> from acquiring memory even when the executor indeed can allocate memory by >>>> evicting storage memory. >>>> >>>> https://issues.apache.org/jira/browse/SPARK-12089 has been fixed. We >>>> are still working on https://issues.apache.org/jira/browse/SPARK-12155. >>>> >>>> On Fri, Dec 4, 2015 at 3:04 PM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> 0 >>>>> >>>>> Currently figuring out who is responsible for the regression that I am >>>>> seeing in some user code ScalaUDFs that make use of Timestamps and where >>>>> NULL from a CSV file read in via a TestHive#registerTestTable is now >>>>> producing 1969-12-31 23:59:59.999999 instead of null. >>>>> >>>>> On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <so...@cloudera.com> wrote: >>>>> >>>>>> Licenses and signature are all fine. >>>>>> >>>>>> Docker integration tests consistently fail for me with Java 7 / Ubuntu >>>>>> and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver" >>>>>> >>>>>> *** RUN ABORTED *** >>>>>> java.lang.NoSuchMethodError: >>>>>> >>>>>> org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; >>>>>> at >>>>>> org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240) >>>>>> at >>>>>> org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) >>>>>> at >>>>>> org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) >>>>>> at >>>>>> org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) >>>>>> at >>>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) >>>>>> at >>>>>> org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) >>>>>> at >>>>>> org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) >>>>>> at >>>>>> org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) >>>>>> at >>>>>> org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) >>>>>> at >>>>>> org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) >>>>>> >>>>>> I also get this failure consistently: >>>>>> >>>>>> DirectKafkaStreamSuite >>>>>> - offset recovery *** FAILED *** >>>>>> recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time, >>>>>> Array[org.apache.spark.streaming.kafka.OffsetRange])) => >>>>>> >>>>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time, >>>>>> >>>>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1, >>>>>> >>>>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange])))) >>>>>> was false Recovered ranges are not the same as the ones generated >>>>>> (DirectKafkaStreamSuite.scala:301) >>>>>> >>>>>> On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust < >>>>>> mich...@databricks.com> wrote: >>>>>> > Please vote on releasing the following candidate as Apache Spark >>>>>> version >>>>>> > 1.6.0! >>>>>> > >>>>>> > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and >>>>>> passes if >>>>>> > a majority of at least 3 +1 PMC votes are cast. >>>>>> > >>>>>> > [ ] +1 Release this package as Apache Spark 1.6.0 >>>>>> > [ ] -1 Do not release this package because ... >>>>>> > >>>>>> > To learn more about Apache Spark, please see >>>>>> http://spark.apache.org/ >>>>>> > >>>>>> > The tag to be voted on is v1.6.0-rc1 >>>>>> > (bf525845cef159d2d4c9f4d64e158f037179b5c4) >>>>>> > >>>>>> > The release files, including signatures, digests, etc. can be found >>>>>> at: >>>>>> > >>>>>> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ >>>>>> > >>>>>> > Release artifacts are signed with the following key: >>>>>> > https://people.apache.org/keys/committer/pwendell.asc >>>>>> > >>>>>> > The staging repository for this release can be found at: >>>>>> > >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1165/ >>>>>> > >>>>>> > The test repository (versioned as v1.6.0-rc1) for this release can >>>>>> be found >>>>>> > at: >>>>>> > >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1164/ >>>>>> > >>>>>> > The documentation corresponding to this release can be found at: >>>>>> > >>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/ >>>>>> > >>>>>> > >>>>>> > ======================================= >>>>>> > == How can I help test this release? == >>>>>> > ======================================= >>>>>> > If you are a Spark user, you can help us test this release by >>>>>> taking an >>>>>> > existing Spark workload and running on this release candidate, then >>>>>> > reporting any regressions. >>>>>> > >>>>>> > ================================================ >>>>>> > == What justifies a -1 vote for this release? == >>>>>> > ================================================ >>>>>> > This vote is happening towards the end of the 1.6 QA period, so -1 >>>>>> votes >>>>>> > should only occur for significant regressions from 1.5. Bugs >>>>>> already present >>>>>> > in 1.5, minor regressions, or bugs related to new features will not >>>>>> block >>>>>> > this release. >>>>>> > >>>>>> > =============================================================== >>>>>> > == What should happen to JIRA tickets still targeting 1.6.0? == >>>>>> > =============================================================== >>>>>> > 1. It is OK for documentation patches to target 1.6.0 and still go >>>>>> into >>>>>> > branch-1.6, since documentations will be published separately from >>>>>> the >>>>>> > release. >>>>>> > 2. New features for non-alpha-modules should target 1.7+. >>>>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the >>>>>> target >>>>>> > version. >>>>>> > >>>>>> > >>>>>> > ================================================== >>>>>> > == Major changes to help you focus your testing == >>>>>> > ================================================== >>>>>> > >>>>>> > Spark SQL >>>>>> > >>>>>> > SPARK-10810 Session Management - The ability to create multiple >>>>>> isolated SQL >>>>>> > Contexts that have their own configuration and default database. >>>>>> This is >>>>>> > turned on by default in the thrift server. >>>>>> > SPARK-9999 Dataset API - A type-safe API (similar to RDDs) that >>>>>> performs >>>>>> > many operations on serialized binary data and code generation (i.e. >>>>>> Project >>>>>> > Tungsten). >>>>>> > SPARK-10000 Unified Memory Management - Shared memory for execution >>>>>> and >>>>>> > caching instead of exclusive division of the regions. >>>>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL >>>>>> queries >>>>>> > over files of any supported format without registering a table. >>>>>> > SPARK-11745 Reading non-standard JSON files - Added options to read >>>>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes) >>>>>> > SPARK-10412 Per-operator Metics for SQL Execution - Display >>>>>> statistics on a >>>>>> > per-operator basis for memory usage and spilled data size. >>>>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to >>>>>> nest and >>>>>> > unest arbitrary numbers of columns >>>>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - >>>>>> Significant >>>>>> > (up to 14x) speed up when caching data that contains complex types >>>>>> in >>>>>> > DataFrames or SQL. >>>>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality >>>>>> (<=>) will >>>>>> > now execute using SortMergeJoin instead of computing a cartisian >>>>>> product. >>>>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for >>>>>> configuring >>>>>> > query execution to occur using off-heap memory to avoid GC overhead >>>>>> > SPARK-10978 Datasource API Avoid Double Filter - When implementing a >>>>>> > datasource with filter pushdown, developers can now tell Spark SQL >>>>>> to avoid >>>>>> > double evaluating a pushed-down filter. >>>>>> > SPARK-4849 Advanced Layout of Cached Data - storing partitioning >>>>>> and >>>>>> > ordering schemes in In-memory table scan, and adding distributeBy >>>>>> and >>>>>> > localSort to DF API >>>>>> > SPARK-9858 Adaptive query execution - Initial support for >>>>>> automatically >>>>>> > selecting the number of reducers for joins and aggregations. >>>>>> > >>>>>> > Spark Streaming >>>>>> > >>>>>> > API Updates >>>>>> > >>>>>> > SPARK-2629 New improved state management - trackStateByKey - a >>>>>> DStream >>>>>> > transformation for stateful stream processing, supersedes >>>>>> updateStateByKey >>>>>> > in functionality and performance. >>>>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been >>>>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of >>>>>> > KPL-aggregated records. >>>>>> > SPARK-10891 Kinesis message handler function - Allows arbitrary >>>>>> function to >>>>>> > be applied to a Kinesis record in the Kinesis receiver before to >>>>>> customize >>>>>> > what data is to be stored in memory. >>>>>> > SPARK-6328 Python Streaming Listener API - Get streaming statistics >>>>>> > (scheduling delays, batch processing times, etc.) in streaming. >>>>>> > >>>>>> > UI Improvements >>>>>> > >>>>>> > Made failures visible in the streaming tab, in the timelines, batch >>>>>> list, >>>>>> > and batch details page. >>>>>> > Made output operations visible in the streaming tab as progress bars >>>>>> > >>>>>> > MLlib >>>>>> > >>>>>> > New algorithms/models >>>>>> > >>>>>> > SPARK-8518 Survival analysis - Log-linear model for survival >>>>>> analysis >>>>>> > SPARK-9834 Normal equation for least squares - Normal equation >>>>>> solver, >>>>>> > providing R-like model summary statistics >>>>>> > SPARK-3147 Online hypothesis testing - A/B testing in the Spark >>>>>> Streaming >>>>>> > framework >>>>>> > SPARK-9930 New feature transformers - ChiSqSelector, >>>>>> QuantileDiscretizer, >>>>>> > SQL transformer >>>>>> > SPARK-6517 Bisecting K-Means clustering - Fast top-down clustering >>>>>> variant >>>>>> > of K-Means >>>>>> > >>>>>> > API improvements >>>>>> > >>>>>> > ML Pipelines >>>>>> > >>>>>> > SPARK-6725 Pipeline persistence - Save/load for ML Pipelines, with >>>>>> partial >>>>>> > coverage of spark.ml algorithms >>>>>> > SPARK-5565 LDA in ML Pipelines - API for Latent Dirichlet >>>>>> Allocation in ML >>>>>> > Pipelines >>>>>> > >>>>>> > R API >>>>>> > >>>>>> > SPARK-9836 R-like statistics for GLMs - (Partial) R-like stats for >>>>>> ordinary >>>>>> > least squares via summary(model) >>>>>> > SPARK-9681 Feature interactions in R formula - Interaction >>>>>> operator ":" in >>>>>> > R formula >>>>>> > >>>>>> > Python API - Many improvements to Python API to approach feature >>>>>> parity >>>>>> > >>>>>> > Misc improvements >>>>>> > >>>>>> > SPARK-7685 , SPARK-9642 Instance weights for GLMs - Logistic and >>>>>> Linear >>>>>> > Regression can take instance weights >>>>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in >>>>>> DataFrames - >>>>>> > Variance, stddev, correlations, etc. >>>>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source >>>>>> > >>>>>> > Documentation improvements >>>>>> > >>>>>> > SPARK-7751 @since versions - Documentation includes initial >>>>>> version when >>>>>> > classes and methods were added >>>>>> > SPARK-11337 Testable example code - Automated testing for code in >>>>>> user guide >>>>>> > examples >>>>>> > >>>>>> > Deprecations >>>>>> > >>>>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been >>>>>> deprecated. >>>>>> > In spark.ml.classification.LogisticRegressionModel and >>>>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has >>>>>> been >>>>>> > deprecated, in favor of the new name "coefficients." This helps >>>>>> disambiguate >>>>>> > from instance (row) weights given to algorithms. >>>>>> > >>>>>> > Changes of behavior >>>>>> > >>>>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed >>>>>> semantics in >>>>>> > 1.6. Previously, it was a threshold for absolute change in error. >>>>>> Now, it >>>>>> > resembles the behavior of GradientDescent convergenceTol: For large >>>>>> errors, >>>>>> > it uses relative error (relative to the previous error); for small >>>>>> errors (< >>>>>> > 0.01), it uses absolute error. >>>>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert >>>>>> strings to >>>>>> > lowercase before tokenizing. Now, it converts to lowercase by >>>>>> default, with >>>>>> > an option not to. This matches the behavior of the simpler Tokenizer >>>>>> > transformer. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> >