spark git commit: [SPARK-6465][SQL] Fix serialization of GenericRowWithSchema using kryo

2015-03-26 Thread lian
Repository: spark Updated Branches: refs/heads/branch-1.3 0ba759985 - 825499655 [SPARK-6465][SQL] Fix serialization of GenericRowWithSchema using kryo Author: Michael Armbrust mich...@databricks.com Closes #5191 from marmbrus/kryoRowsWithSchema and squashes the following commits: bb83522

spark git commit: [MLlib]remove unused import

2015-03-26 Thread srowen
Repository: spark Updated Branches: refs/heads/master 1c05027a1 - 3ddb975fa [MLlib]remove unused import minor thing. Let me know if jira is required. Author: Yuhao Yang hhb...@gmail.com Closes #5207 from hhbyyh/adjustImport and squashes the following commits: 2240121 [Yuhao Yang] remove

spark git commit: [SPARK-6491] Spark will put the current working dir to the CLASSPATH

2015-03-26 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 836c92165 - 5b5f0e2b0 [SPARK-6491] Spark will put the current working dir to the CLASSPATH When running bin/computer-classpath.sh, the output will be:

spark git commit: [SPARK-6468][Block Manager] Fix the race condition of subDirs in DiskBlockManager

2015-03-26 Thread srowen
Repository: spark Updated Branches: refs/heads/master f88f51bbd - 0c88ce541 [SPARK-6468][Block Manager] Fix the race condition of subDirs in DiskBlockManager There are two race conditions of `subDirs` in `DiskBlockManager`: 1. `getAllFiles` does not use correct locks to read the contents in

spark git commit: [SQL][SPARK-6471]: Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns

2015-03-26 Thread lian
Repository: spark Updated Branches: refs/heads/master 0c88ce541 - 1c05027a1 [SQL][SPARK-6471]: Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns Currently in the parquet relation 2 implementation, error is thrown in case merged

spark git commit: [SQL][SPARK-6471]: Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns

2015-03-26 Thread lian
Repository: spark Updated Branches: refs/heads/branch-1.3 825499655 - 836c92165 [SQL][SPARK-6471]: Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns Currently in the parquet relation 2 implementation, error is thrown in case

spark git commit: [SPARK-6405] Limiting the maximum Kryo buffer size to be 2GB.

2015-03-26 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 39fb57968 - 49d2ec63e [SPARK-6405] Limiting the maximum Kryo buffer size to be 2GB. Kryo buffers are backed by byte arrays, but primitive arrays can only be up to 2GB in size. It is misleading to allow users to set buffers past this size.

spark git commit: SPARK-6532 [BUILD] LDAModel.scala fails scalastyle on Windows

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master fe15ea976 - c3a52a082 SPARK-6532 [BUILD] LDAModel.scala fails scalastyle on Windows Use standard UTF-8 source / report encoding for scalastyle Author: Sean Owen so...@cloudera.com Closes #5211 from srowen/SPARK-6532 and squashes the

spark git commit: [SPARK-6554] [SQL] Don't push down predicates which reference partition column(s)

2015-03-26 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 784fcd532 - 71a0d40eb [SPARK-6554] [SQL] Don't push down predicates which reference partition column(s) There are two cases for the new Parquet data source: 1. Partition columns exist in the Parquet data files We don't need to

spark git commit: [SPARK-6117] [SQL] Improvements to DataFrame.describe()

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 84735c363 - 28e3a1e34 [SPARK-6117] [SQL] Improvements to DataFrame.describe() 1. Slightly modifications to the code to make it more readable. 2. Added Python implementation. 3. Updated the documentation to state that we don't guarantee

spark git commit: [SPARK-6117] [SQL] Improvements to DataFrame.describe()

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master c3a52a082 - 784fcd532 [SPARK-6117] [SQL] Improvements to DataFrame.describe() 1. Slightly modifications to the code to make it more readable. 2. Added Python implementation. 3. Updated the documentation to state that we don't guarantee the

spark git commit: [SPARK-6117] [SQL] add describe function to DataFrame for summary statis...

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 aa2d157c6 - 84735c363 [SPARK-6117] [SQL] add describe function to DataFrame for summary statis... Please review my solution for SPARK-6117 Author: azagrebin azagre...@gmail.com Closes #5073 from azagrebin/SPARK-6117 and squashes the

spark git commit: [DOCS][SQL] Fix JDBC example

2015-03-26 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.3 3d545782e - 54d92b542 [DOCS][SQL] Fix JDBC example Author: Michael Armbrust mich...@databricks.com Closes #5192 from marmbrus/fixJDBCDocs and squashes the following commits: b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example

spark git commit: [DOCS][SQL] Fix JDBC example

2015-03-26 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 71a0d40eb - aad003227 [DOCS][SQL] Fix JDBC example Author: Michael Armbrust mich...@databricks.com Closes #5192 from marmbrus/fixJDBCDocs and squashes the following commits: b48a33d [Michael Armbrust] [DOCS][SQL] Fix JDBC example

spark git commit: [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference

2015-03-26 Thread ankurdave
Repository: spark Updated Branches: refs/heads/master aad003227 - 39fb57968 [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#difference Adds a `Graph#minus` method which will return only unique `VertexId`'s from the calling `VertexRDD`. To demonstrate a basic example with

spark git commit: [SPARK-6536] [PySpark] Column.inSet() in Python

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.3 9edb34fc3 - 0ba759985 [SPARK-6536] [PySpark] Column.inSet() in Python ``` df[df.name.inSet(Bob, Mike)].collect() [Row(age=5, name=u'Bob')] df[df.age.inSet([1, 2, 3])].collect() [Row(age=2, name=u'Alice')] ``` Author: Davies Liu

spark git commit: [SPARK-6536] [PySpark] Column.inSet() in Python

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master 276ef1c3c - f53580297 [SPARK-6536] [PySpark] Column.inSet() in Python ``` df[df.name.inSet(Bob, Mike)].collect() [Row(age=5, name=u'Bob')] df[df.age.inSet([1, 2, 3])].collect() [Row(age=2, name=u'Alice')] ``` Author: Davies Liu

spark git commit: [SPARK-6117] [SQL] add describe function to DataFrame for summary statis...

2015-03-26 Thread rxin
Repository: spark Updated Branches: refs/heads/master f53580297 - 5bbcd1304 [SPARK-6117] [SQL] add describe function to DataFrame for summary statis... Please review my solution for SPARK-6117 Author: azagrebin azagre...@gmail.com Closes #5073 from azagrebin/SPARK-6117 and squashes the

spark git commit: SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases

2015-03-26 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 5b5f0e2b0 - aa2d157c6 SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly

spark git commit: SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases

2015-03-26 Thread srowen
Repository: spark Updated Branches: refs/heads/master 3ddb975fa - fe15ea976 SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly Author:

spark git commit: SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases

2015-03-26 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.2 61c059a4a - 758ebf77d SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly