The head of branch 1.5 will always be a "1.5.x-SNAPSHOT" version. Yeah technically you would expect it to be 1.5.0-SNAPSHOT until 1.5.0 is released. In practice I think it's simpler to follow the defaults of the Maven release plugin, which will set this to 1.5.1-SNAPSHOT after any 1.5.0-rc is released. It doesn't affect later RCs. This has nothing to do with what commits go into 1.5.0; it's an ignorable detail of the version in POMs in the source tree, which don't mean much anyway as the source tree itself is not a released version.
On Tue, Sep 1, 2015 at 2:48 PM, <ches...@alpinenow.com> wrote: > Sorry, I am still not follow. I assume the release would build from 1.5.0 > before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from 1.5.1 > snapshot during release ? Or 1.5.0 rc3 would build from the last commit of > 1.5.0 (before changing to 1.5.1 snapshot) ? > > > > Sent from my iPad > >> On Sep 1, 2015, at 1:52 AM, Sean Owen <so...@cloudera.com> wrote: >> >> That's correct for the 1.5 branch, right? this doesn't mean that the >> next RC would have this value. You choose the release version during >> the release process. >> >>> On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen <ches...@alpinenow.com> wrote: >>> Seems that Github branch-1.5 already changing the version to 1.5.1-SNAPSHOT, >>> >>> I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ? >>> >>> Chester >>> >>>> On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin <r...@databricks.com> wrote: >>>> >>>> I'm going to -1 the release myself since the issue @yhuai identified is >>>> pretty serious. It basically OOMs the driver for reading any files with a >>>> large number of partitions. Looks like the patch for that has already been >>>> merged. >>>> >>>> I'm going to cut rc3 momentarily. >>>> >>>> >>>> On Sun, Aug 30, 2015 at 11:30 AM, Sandy Ryza <sandy.r...@cloudera.com> >>>> wrote: >>>>> >>>>> +1 (non-binding) >>>>> built from source and ran some jobs against YARN >>>>> >>>>> -Sandy >>>>> >>>>> On Sat, Aug 29, 2015 at 5:50 AM, vaquar khan <vaquar.k...@gmail.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> +1 (1.5.0 RC2)Compiled on Windows with YARN. >>>>>> >>>>>> Regards, >>>>>> Vaquar khan >>>>>> >>>>>> +1 (non-binding, of course) >>>>>> >>>>>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 42:36 min >>>>>> mvn clean package -Pyarn -Phadoop-2.6 -DskipTests >>>>>> 2. Tested pyspark, mllib >>>>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK >>>>>> 2.2. Linear/Ridge/Laso Regression OK >>>>>> 2.3. Decision Tree, Naive Bayes OK >>>>>> 2.4. KMeans OK >>>>>> Center And Scale OK >>>>>> 2.5. RDD operations OK >>>>>> State of the Union Texts - MapReduce, Filter,sortByKey (word >>>>>> count) >>>>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK >>>>>> Model evaluation/optimization (rank, numIter, lambda) with >>>>>> itertools OK >>>>>> 3. Scala - MLlib >>>>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK >>>>>> 3.2. LinearRegressionWithSGD OK >>>>>> 3.3. Decision Tree OK >>>>>> 3.4. KMeans OK >>>>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK >>>>>> 3.6. saveAsParquetFile OK >>>>>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, >>>>>> registerTempTable, sql OK >>>>>> 3.8. result = sqlContext.sql("SELECT >>>>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER >>>>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK >>>>>> 4.0. Spark SQL from Python OK >>>>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") >>>>>> OK >>>>>> 5.0. Packages >>>>>> 5.1. com.databricks.spark.csv - read/write OK >>>>>> (--packages com.databricks:spark-csv_2.11:1.2.0-s_2.11 didn’t work. But >>>>>> com.databricks:spark-csv_2.11:1.2.0 worked) >>>>>> 6.0. DataFrames >>>>>> 6.1. cast,dtypes OK >>>>>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK >>>>>> 6.3. joins,sql,set operations,udf OK >>>>>> >>>>>> Cheers >>>>>> <k/> >>>>>> >>>>>> On Tue, Aug 25, 2015 at 9:28 PM, Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version 1.5.0. The vote is open until Friday, Aug 29, 2015 at 5:00 UTC >>>>>>> and >>>>>>> passes if a majority of at least 3 +1 PMC votes are cast. >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 1.5.0 >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>>>>> >>>>>>> >>>>>>> The tag to be voted on is v1.5.0-rc2: >>>>>>> >>>>>>> https://github.com/apache/spark/tree/727771352855dbb780008c449a877f5aaa5fc27a >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found at: >>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-bin/ >>>>>>> >>>>>>> Release artifacts are signed with the following key: >>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>> >>>>>>> The staging repository for this release (published as 1.5.0-rc2) can be >>>>>>> found at: >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1141/ >>>>>>> >>>>>>> The staging repository for this release (published as 1.5.0) can be >>>>>>> found at: >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1140/ >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-docs/ >>>>>>> >>>>>>> >>>>>>> ======================================= >>>>>>> How can I help test this release? >>>>>>> ======================================= >>>>>>> If you are a Spark user, you can help us test this release by taking an >>>>>>> existing Spark workload and running on this release candidate, then >>>>>>> reporting any regressions. >>>>>>> >>>>>>> >>>>>>> ================================================ >>>>>>> What justifies a -1 vote for this release? >>>>>>> ================================================ >>>>>>> This vote is happening towards the end of the 1.5 QA period, so -1 >>>>>>> votes should only occur for significant regressions from 1.4. Bugs >>>>>>> already >>>>>>> present in 1.4, minor regressions, or bugs related to new features will >>>>>>> not >>>>>>> block this release. >>>>>>> >>>>>>> >>>>>>> =============================================================== >>>>>>> What should happen to JIRA tickets still targeting 1.5.0? >>>>>>> =============================================================== >>>>>>> 1. It is OK for documentation patches to target 1.5.0 and still go into >>>>>>> branch-1.5, since documentations will be packaged separately from the >>>>>>> release. >>>>>>> 2. New features for non-alpha-modules should target 1.6+. >>>>>>> 3. Non-blocker bug fixes should target 1.5.1 or 1.6.0, or drop the >>>>>>> target version. >>>>>>> >>>>>>> >>>>>>> ================================================== >>>>>>> Major changes to help you focus your testing >>>>>>> ================================================== >>>>>>> >>>>>>> As of today, Spark 1.5 contains more than 1000 commits from 220+ >>>>>>> contributors. I've curated a list of important changes for 1.5. For the >>>>>>> complete list, please refer to Apache JIRA changelog. >>>>>>> >>>>>>> RDD/DataFrame/SQL APIs >>>>>>> >>>>>>> - New UDAF interface >>>>>>> - DataFrame hints for broadcast join >>>>>>> - expr function for turning a SQL expression into DataFrame column >>>>>>> - Improved support for NaN values >>>>>>> - StructType now supports ordering >>>>>>> - TimestampType precision is reduced to 1us >>>>>>> - 100 new built-in expressions, including date/time, string, math >>>>>>> - memory and local disk only checkpointing >>>>>>> >>>>>>> DataFrame/SQL Backend Execution >>>>>>> >>>>>>> - Code generation on by default >>>>>>> - Improved join, aggregation, shuffle, sorting with cache friendly >>>>>>> algorithms and external algorithms >>>>>>> - Improved window function performance >>>>>>> - Better metrics instrumentation and reporting for DF/SQL execution >>>>>>> plans >>>>>>> >>>>>>> Data Sources, Hive, Hadoop, Mesos and Cluster Management >>>>>>> >>>>>>> - Dynamic allocation support in all resource managers (Mesos, YARN, >>>>>>> Standalone) >>>>>>> - Improved Mesos support (framework authentication, roles, dynamic >>>>>>> allocation, constraints) >>>>>>> - Improved YARN support (dynamic allocation with preferred locations) >>>>>>> - Improved Hive support (metastore partition pruning, metastore >>>>>>> connectivity to 0.13 to 1.2, internal Hive upgrade to 1.2) >>>>>>> - Support persisting data in Hive compatible format in metastore >>>>>>> - Support data partitioning for JSON data sources >>>>>>> - Parquet improvements (upgrade to 1.7, predicate pushdown, faster >>>>>>> metadata discovery and schema merging, support reading non-standard >>>>>>> legacy >>>>>>> Parquet files generated by other libraries) >>>>>>> - Faster and more robust dynamic partition insert >>>>>>> - DataSourceRegister interface for external data sources to specify >>>>>>> short names >>>>>>> >>>>>>> SparkR >>>>>>> >>>>>>> - YARN cluster mode in R >>>>>>> - GLMs with R formula, binomial/Gaussian families, and elastic-net >>>>>>> regularization >>>>>>> - Improved error messages >>>>>>> - Aliases to make DataFrame functions more R-like >>>>>>> >>>>>>> Streaming >>>>>>> >>>>>>> - Backpressure for handling bursty input streams. >>>>>>> - Improved Python support for streaming sources (Kafka offsets, >>>>>>> Kinesis, MQTT, Flume) >>>>>>> - Improved Python streaming machine learning algorithms (K-Means, >>>>>>> linear regression, logistic regression) >>>>>>> - Native reliable Kinesis stream support >>>>>>> - Input metadata like Kafka offsets made visible in the batch details >>>>>>> UI >>>>>>> - Better load balancing and scheduling of receivers across cluster >>>>>>> - Include streaming storage in web UI >>>>>>> >>>>>>> Machine Learning and Advanced Analytics >>>>>>> >>>>>>> - Feature transformers: CountVectorizer, Discrete Cosine >>>>>>> transformation, MinMaxScaler, NGram, PCA, RFormula, StopWordsRemover, >>>>>>> and >>>>>>> VectorSlicer. >>>>>>> - Estimators under pipeline APIs: naive Bayes, k-means, and isotonic >>>>>>> regression. >>>>>>> - Algorithms: multilayer perceptron classifier, PrefixSpan for >>>>>>> sequential pattern mining, association rule generation, 1-sample >>>>>>> Kolmogorov-Smirnov test. >>>>>>> - Improvements to existing algorithms: LDA, trees/ensembles, GMMs >>>>>>> - More efficient Pregel API implementation for GraphX >>>>>>> - Model summary for linear and logistic regression. >>>>>>> - Python API: distributed matrices, streaming k-means and linear >>>>>>> models, LDA, power iteration clustering, etc. >>>>>>> - Tuning and evaluation: train-validation split and multiclass >>>>>>> classification evaluator. >>>>>>> - Documentation: document the release version of public API methods >>>>>>> >>>>>> >>>>> >>>> >>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org