(spark) branch master updated (eb8e99721714 -> f0d8f82f8c45)

2024-04-07 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from eb8e99721714 [SPARK-47657][SQL] Implement collation filter push down 
support per file source
 add f0d8f82f8c45 [SPARK-47750][DOCS][SQL] Postgres: Document Mapping Spark 
SQL Data Types to PostgreSQL

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-jdbc.md | 182 ++
 1 file changed, 182 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47657][SQL] Implement collation filter push down support per file source

2024-04-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eb8e99721714 [SPARK-47657][SQL] Implement collation filter push down 
support per file source
eb8e99721714 is described below

commit eb8e99721714eeac14978f0cb6a2dc35251a5d23
Author: Stefan Kandic 
AuthorDate: Mon Apr 8 12:17:38 2024 +0800

[SPARK-47657][SQL] Implement collation filter push down support per file 
source

### What changes were proposed in this pull request?

Previously in #45262 we completely disabled filter pushdown for any 
expression referencing non utf8 binary collated columns. However, we should 
make this more fine-grained so that individual data sources can decide to 
support pushing down these filters if they can.

### Why are the changes needed?

To enable collation filter push down for an individual data source.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

With previously added unit test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45782 from stefankandic/newPushdownLogic.

Authored-by: Stefan Kandic 
Signed-off-by: Wenchen Fan 
---
 .../execution/datasources/DataSourceUtils.scala|  9 ++-
 .../sql/execution/datasources/FileFormat.scala |  6 ++
 .../execution/datasources/FileSourceStrategy.scala |  3 +-
 .../datasources/PruneFileSourcePartitions.scala|  4 +-
 .../execution/datasources/v2/FileScanBuilder.scala |  9 ++-
 .../spark/sql/FileBasedDataSourceSuite.scala   | 85 --
 6 files changed, 70 insertions(+), 46 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
index 38567c16fd1f..0db5de724340 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
@@ -284,12 +284,15 @@ object DataSourceUtils extends PredicateHelper {
* Determines whether a filter should be pushed down to the data source or 
not.
*
* @param expression The filter expression to be evaluated.
+   * @param isCollationPushDownSupported Whether the data source supports 
collation push down.
* @return A boolean indicating whether the filter should be pushed down or 
not.
*/
-  def shouldPushFilter(expression: Expression): Boolean = {
-expression.deterministic && !expression.exists {
+  def shouldPushFilter(expression: Expression, isCollationPushDownSupported: 
Boolean): Boolean = {
+if (!expression.deterministic) return false
+
+isCollationPushDownSupported || !expression.exists {
   case childExpression @ (_: Attribute | _: GetStructField) =>
-// don't push down filters for types with non-default collation
+// don't push down filters for types with non-binary sortable collation
 // as it could lead to incorrect results
 
SchemaUtils.hasNonBinarySortableCollatedString(childExpression.dataType)
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala
index 36c59950fe20..0785b0cbe9e2 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala
@@ -223,6 +223,12 @@ trait FileFormat {
*/
   def fileConstantMetadataExtractors: Map[String, PartitionedFile => Any] =
 FileFormat.BASE_METADATA_EXTRACTORS
+
+  /**
+   * Returns whether the file format supports filter push down
+   * for non utf8 binary collated columns.
+   */
+  def supportsCollationPushDown: Boolean = false
 }
 
 object FileFormat {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
index e4b66d72eaf8..f2dcbe26104f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@@ -160,7 +160,8 @@ object FileSourceStrategy extends Strategy with 
PredicateHelper with Logging {
   //  - filters that need to be evaluated again after the scan
   val filterSet = ExpressionSet(filters)
 
-  val filtersToPush = filters.filter(f => 
DataSourceUtils.shouldPushFilter(f))
+  val filtersToPush = filters.filter(f =>
+  DataSourceUtils.shouldPushFilter(f, 

(spark) branch master updated: [SPARK-47713][SQL][CONNECT] Fix a self-join failure

2024-04-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a39ac231fe4 [SPARK-47713][SQL][CONNECT] Fix a self-join failure
3a39ac231fe4 is described below

commit 3a39ac231fe43332bc242ac582f30bb57c739927
Author: Ruifeng Zheng 
AuthorDate: Mon Apr 8 12:06:03 2024 +0800

[SPARK-47713][SQL][CONNECT] Fix a self-join failure

### What changes were proposed in this pull request?
update the logic to resolve column in spark connect

### Why are the changes needed?
```
df = spark.createDataFrame([(1, 2), (3, 4)], schema=["a", "b"])
df2 = df.select(df.a.alias("aa"), df.b)
df3 = df2.join(df, df2.b == df.b)

AnalysisException: [AMBIGUOUS_COLUMN_REFERENCE] Column "b" is ambiguous. 
It's because you joined several DataFrame together, and some of these 
DataFrames are the same.
This column points to one of the DataFrames but Spark is unable to figure 
out which one.
Please alias the DataFrames with different names via `DataFrame.alias` 
before joining them,
and specify the column using qualified name, e.g. 
`df.alias("a").join(df.alias("b"), col("a.id") > col("b.id"))`. SQLSTATE: 42702

```

### Does this PR introduce _any_ user-facing change?
yes, above query can run successfully after this PR

This PR only affects Spark Connect, won't affect Classic Spark.

### How was this patch tested?
added tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45846 from zhengruifeng/fix_connect_self_join_depth.

Authored-by: Ruifeng Zheng 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  | 11 ---
 .../sql/tests/connect/test_connect_basic.py|  9 +
 python/pyspark/sql/tests/test_dataframe.py |  7 
 .../catalyst/analysis/ColumnResolutionHelper.scala | 38 +-
 4 files changed, 45 insertions(+), 20 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index 95ee69d2a47d..a0729adb8960 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -940,11 +940,12 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper with PrivateM
 }
 assert(e3.getMessage.contains("AMBIGUOUS_COLUMN_REFERENCE"))
 
-val e4 = intercept[AnalysisException] {
-  // df1("i") is ambiguous as df1 appears in both join sides (df1_filter 
contains df1).
-  df1.join(df1_filter, df1("i") === 1).collect()
-}
-assert(e4.getMessage.contains("AMBIGUOUS_COLUMN_REFERENCE"))
+// TODO(SPARK-47749): Dataframe.collect should accept duplicated column 
names
+assert(
+  // df1.join(df1_filter, df1("i") === 1) fails in classic spark due to:
+  // org.apache.spark.sql.AnalysisException: Column i#24 are ambiguous
+  df1.join(df1_filter, df1("i") === 1).columns ===
+Array("i", "j", "i", "j"))
 
 checkSameResult(
   Seq(Row("a")),
diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py 
b/python/pyspark/sql/tests/connect/test_connect_basic.py
index 3b8e8165b4bf..16e9a577451f 100755
--- a/python/pyspark/sql/tests/connect/test_connect_basic.py
+++ b/python/pyspark/sql/tests/connect/test_connect_basic.py
@@ -1155,6 +1155,15 @@ class SparkConnectBasicTests(SparkConnectSQLTestCase):
 
set(spark_df.select("id").crossJoin(other=spark_df.select("name")).toPandas()),
 )
 
+def test_self_join(self):
+# SPARK-47713: this query fails in classic spark
+df1 = self.connect.createDataFrame([(1, "a")], schema=["i", "j"])
+df1_filter = df1.filter(df1.i > 0)
+df2 = df1.join(df1_filter, df1.i == 1)
+self.assertEqual(df2.count(), 1)
+self.assertEqual(df2.columns, ["i", "j", "i", "j"])
+self.assertEqual(list(df2.first()), [1, "a", 1, "a"])
+
 def test_with_metadata(self):
 cdf = self.connect.createDataFrame(data=[(2, "Alice"), (5, "Bob")], 
schema=["age", "name"])
 self.assertEqual(cdf.schema["age"].metadata, {})
diff --git a/python/pyspark/sql/tests/test_dataframe.py 
b/python/pyspark/sql/tests/test_dataframe.py
index dc7d39155345..1eccb40e709c 100644
--- a/python/pyspark/sql/tests/test_dataframe.py
+++ b/python/pyspark/sql/tests/test_dataframe.py
@@ -123,6 +123,13 @@ class DataFrameTestsMixin:
 df = df2.join(df1, df2["b"] == df1["a"])
 self.assertTrue(df.count() == 100)
 
+def test_self_join_II(self):
+df = 

(spark) branch master updated: [SPARK-47558][SS] State TTL support for ValueState

2024-04-07 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d55bb617a135 [SPARK-47558][SS] State TTL support for ValueState
d55bb617a135 is described below

commit d55bb617a13561f0eb9f301089a4e4fb06e06228
Author: Bhuwan Sahni 
AuthorDate: Mon Apr 8 12:22:04 2024 +0900

[SPARK-47558][SS] State TTL support for ValueState

**Note**: This change has been co-authored by ericm-db  and sahnib

**Authors: ericm-db sahnib**

### What changes were proposed in this pull request?

This PR adds support for expiring state based on TTL for ValueState. Using 
this functionality, Spark users can specify a TTL Mode for transformWithState 
operator, and provide a ttlDuration/expirationTImeInMs for each value in 
ValueState. TTL support for List/Map State will be added in future PRs. Once 
the ttlDuration has expired, the value will not be returned as part of `get()` 
and would be cleaned up at the end of the micro-batch.

### Why are the changes needed?

These changes are needed to support TTL for ValueState. The PR supports 
specifying ttl for processing time or event time. Processing time ttl is 
calculated by adding ttlDuration to `batchTimestamp`, and event time ttl is 
specified using absolute expiration time (`expirationTimeInMs`).

### Does this PR introduce _any_ user-facing change?

Yes, modifies the ValueState interface for specifying `ttlDuration`, and 
adds `ttlMode` to `transformWithState` API.

### How was this patch tested?

Added unit test cases for both event time and processing time in 
`ValueStateWithTTLSuite`.

```
WARNING: Using incubator modules: jdk.incubator.foreign, 
jdk.incubator.vector
[info] TransformWithStateTTLSuite:
11:56:54.590 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
11:56:56.054 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate state is evicted at ttl expiry - processing time ttl (6 
seconds, 244 milliseconds)
11:57:01.188 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate ttl update updates the expiration timestamp - processing 
time ttl (4 seconds, 465 milliseconds)
11:57:05.641 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate ttl removal keeps value in state - processing time ttl (4 
seconds, 407 milliseconds)
11:57:10.041 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate multiple value states - with and without ttl - processing 
time ttl (3 seconds, 131 milliseconds)
11:57:13.175 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate state is evicted at ttl expiry - event time ttl (4 
seconds, 186 milliseconds)
11:57:17.355 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate ttl update updates the expiration timestamp - event time 
ttl (4 seconds, 28 milliseconds)
11:57:21.391 WARN 
org.apache.spark.sql.execution.streaming.ResolveWriteToStream: 
spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets 
and will be disabled.
[info] - validate ttl removal keeps value in state - event time ttl (4 
seconds, 428 milliseconds)
11:57:25.838 WARN org.apache.spark.sql.streaming.TransformWithStateTTLSuite:

[info] Run completed in 32 seconds, 433 milliseconds.
[info] Total number of tests run: 7
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 7, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45674 from sahnib/state-ttl.

Authored-by: Bhuwan Sahni 
Signed-off-by: Jungtaek Lim 
---
 .../src/main/resources/error/error-classes.json|  17 +
 .../apache/spark/sql/KeyValueGroupedDataset.scala  |  14 +-
 dev/checkstyle-suppressions.xml|   2 +
 ...r-conditions-unsupported-feature-error-class.md |   4 +
 

(spark) branch master updated (ad2367c55aeb -> f576b8542f2e)

2024-04-07 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ad2367c55aeb [MINOR][PYTHON][SS][TESTS] Drop the tables after being 
used at `test_streaming_foreach_batch`
 add f576b8542f2e [SPARK-47541][SQL] Collated strings in complex types 
supporting operations reverse, array_join, concat, map

No new revisions were added by this update.

Summary of changes:
 .../expressions/collectionOperations.scala | 11 ++---
 .../sql/catalyst/util/ArrayBasedMapBuilder.scala   | 24 ++
 .../org/apache/spark/sql/CollationSuite.scala  | 52 ++
 .../apache/spark/sql/DataFrameFunctionsSuite.scala |  8 ++--
 4 files changed, 78 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][PYTHON][SS][TESTS] Drop the tables after being used at `test_streaming_foreach_batch`

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ad2367c55aeb [MINOR][PYTHON][SS][TESTS] Drop the tables after being 
used at `test_streaming_foreach_batch`
ad2367c55aeb is described below

commit ad2367c55aebf417183eda13e56c55364276f145
Author: Hyukjin Kwon 
AuthorDate: Mon Apr 8 11:00:10 2024 +0900

[MINOR][PYTHON][SS][TESTS] Drop the tables after being used at 
`test_streaming_foreach_batch`

### What changes were proposed in this pull request?

This PR proposes to drop the tables after tests finished.

### Why are the changes needed?

- To clean up resources properly.
- It can affect other test cases when only one session is being used across 
other tests.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Tested in https://github.com/apache/spark/pull/45870

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45920 from HyukjinKwon/minor-cleanup-table.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../streaming/test_streaming_foreach_batch.py  | 140 +++--
 1 file changed, 72 insertions(+), 68 deletions(-)

diff --git a/python/pyspark/sql/tests/streaming/test_streaming_foreach_batch.py 
b/python/pyspark/sql/tests/streaming/test_streaming_foreach_batch.py
index 5d2c1bbbf62c..ef286115a303 100644
--- a/python/pyspark/sql/tests/streaming/test_streaming_foreach_batch.py
+++ b/python/pyspark/sql/tests/streaming/test_streaming_foreach_batch.py
@@ -97,46 +97,48 @@ class StreamingTestsForeachBatchMixin:
 
 def test_streaming_foreach_batch_spark_session(self):
 table_name = "testTable_foreach_batch"
+with self.table(table_name):
 
-def func(df: DataFrame, batch_id: int):
-if batch_id > 0:  # only process once
-return
-spark = df.sparkSession
-df1 = spark.createDataFrame([("structured",), ("streaming",)])
-df1.union(df).write.mode("append").saveAsTable(table_name)
+def func(df: DataFrame, batch_id: int):
+if batch_id > 0:  # only process once
+return
+spark = df.sparkSession
+df1 = spark.createDataFrame([("structured",), ("streaming",)])
+df1.union(df).write.mode("append").saveAsTable(table_name)
 
-df = 
self.spark.readStream.format("text").load("python/test_support/sql/streaming")
-q = df.writeStream.foreachBatch(func).start()
-q.processAllAvailable()
-q.stop()
+df = 
self.spark.readStream.format("text").load("python/test_support/sql/streaming")
+q = df.writeStream.foreachBatch(func).start()
+q.processAllAvailable()
+q.stop()
 
-actual = self.spark.read.table(table_name)
-df = (
-self.spark.read.format("text")
-.load(path="python/test_support/sql/streaming/")
-.union(self.spark.createDataFrame([("structured",), 
("streaming",)]))
-)
-self.assertEqual(sorted(df.collect()), sorted(actual.collect()))
+actual = self.spark.read.table(table_name)
+df = (
+self.spark.read.format("text")
+.load(path="python/test_support/sql/streaming/")
+.union(self.spark.createDataFrame([("structured",), 
("streaming",)]))
+)
+self.assertEqual(sorted(df.collect()), sorted(actual.collect()))
 
 def test_streaming_foreach_batch_path_access(self):
 table_name = "testTable_foreach_batch_path"
+with self.table(table_name):
 
-def func(df: DataFrame, batch_id: int):
-if batch_id > 0:  # only process once
-return
-spark = df.sparkSession
-df1 = 
spark.read.format("text").load("python/test_support/sql/streaming")
-df1.union(df).write.mode("append").saveAsTable(table_name)
+def func(df: DataFrame, batch_id: int):
+if batch_id > 0:  # only process once
+return
+spark = df.sparkSession
+df1 = 
spark.read.format("text").load("python/test_support/sql/streaming")
+df1.union(df).write.mode("append").saveAsTable(table_name)
 
-df = 
self.spark.readStream.format("text").load("python/test_support/sql/streaming")
-q = df.writeStream.foreachBatch(func).start()
-q.processAllAvailable()
-q.stop()
+df = 
self.spark.readStream.format("text").load("python/test_support/sql/streaming")
+q = df.writeStream.foreachBatch(func).start()
+q.processAllAvailable()
+   

(spark) branch master updated (b299b2bc06a9 -> cc6c0eb1bee6)

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b299b2bc06a9 [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` 
in the dropdown of different versions of PySpark documents
 add cc6c0eb1bee6 [MINOR][TESTS] Deduplicate test cases 
`test_parse_datatype_string`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_parity_types.py | 4 
 1 file changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Fix version dropdown issue in version 3.5.1 of pyspark document [spark-website]

2024-04-07 Thread via GitHub


panbingkun commented on PR #507:
URL: https://github.com/apache/spark-website/pull/507#issuecomment-2041706595

   > Also, will the json file be updated automatically, or is release manager 
expected to manually add the entry? If it's latter, we'll need to update the 
release process page in Spark website.
   
   Currently, it needs to be manually updated. Let me update the release 
process page in Spark website, and then I will think of a way to automate it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Fix version dropdown issue in version 3.5.1 of pyspark document [spark-website]

2024-04-07 Thread via GitHub


panbingkun commented on PR #507:
URL: https://github.com/apache/spark-website/pull/507#issuecomment-2041704111

   > Sorry to visit this too lately. Definitely we need to fix the version in 
3.5.1. It'd be awesome if you could submit a PR for the fix. Thanks!
   
   Sure, let me do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents

2024-04-07 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 850ec0b4adcb [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` 
in the dropdown of different versions of PySpark documents
850ec0b4adcb is described below

commit 850ec0b4adcb219d048bed003a7cb42cfc731f33
Author: panbingkun 
AuthorDate: Mon Apr 8 10:19:26 2024 +0900

[SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of 
different versions of PySpark documents

### What changes were proposed in this pull request?
The pr aims to use the same `versions.json` in the dropdown of `different 
versions` of PySpark documents.

### Why are the changes needed?
As discussed in the email group, using this approach can avoid `maintenance 
difficulties` and `inconsistencies` that may arise when `multi active release 
version lines` are released in the future.
https://github.com/apache/spark/assets/15246973/8a08a4fe-e1fb-4334-a3f9-c6dffb01cbd6;>

### Does this PR introduce _any_ user-facing change?
Yes, only for pyspark docs.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45400 from panbingkun/SPARK-47299.

Authored-by: panbingkun 
Signed-off-by: Jungtaek Lim 
(cherry picked from commit b299b2bc06a91db630ab39b9c35663342931bb56)
Signed-off-by: Jungtaek Lim 
---
 python/docs/source/_static/versions.json | 22 --
 python/docs/source/conf.py   |  6 +-
 2 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/python/docs/source/_static/versions.json 
b/python/docs/source/_static/versions.json
deleted file mode 100644
index 3d0bd1481806..
--- a/python/docs/source/_static/versions.json
+++ /dev/null
@@ -1,22 +0,0 @@
-[
-{
-"name": "3.4.1",
-"version": "3.4.1"
-},
-{
-"name": "3.4.0",
-"version": "3.4.0"
-},
-{
-"name": "3.3.2",
-"version": "3.3.2"
-},
-{
-"name": "3.3.1",
-"version": "3.3.1"
-},
-{
-"name": "3.3.0",
-"version": "3.3.0"
-}
-]
diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 08a25c5dd071..1b5cf3474465 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -182,7 +182,11 @@ autosummary_generate = True
 html_theme = 'pydata_sphinx_theme'
 
 html_context = {
-"switcher_json_url": "_static/versions.json",
+# When releasing a new Spark version, please update the file
+# "site/static/versions.json" under the code repository "spark-website"
+# (item should be added in order), and also set the local environment
+# variable "RELEASE_VERSION".
+"switcher_json_url": "https://spark.apache.org/static/versions.json;,
 "switcher_template_url": 
"https://spark.apache.org/docs/{version}/api/python/index.html;,
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (0c992b205946 -> b299b2bc06a9)

2024-04-07 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0c992b205946 [SPARK-47755][CONNECT] Pivot should fail when the number 
of distinct values is too large
 add b299b2bc06a9 [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` 
in the dropdown of different versions of PySpark documents

No new revisions were added by this update.

Summary of changes:
 python/docs/source/_static/versions.json | 22 --
 python/docs/source/conf.py   |  6 +-
 2 files changed, 5 insertions(+), 23 deletions(-)
 delete mode 100644 python/docs/source/_static/versions.json


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Fix version dropdown issue in version 3.5.1 of pyspark document [spark-website]

2024-04-07 Thread via GitHub


HeartSaVioR commented on PR #507:
URL: https://github.com/apache/spark-website/pull/507#issuecomment-2041698402

   Also, will the json file be updated automatically, or is release manager 
expected to manually add the entry? If it's latter, we'll need to update the 
release process page in Spark website.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (e92e8f5441a7 -> 0c992b205946)

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e92e8f5441a7 [SPARK-47744] Add support for negative-valued bytes in 
range encoder
 add 0c992b205946 [SPARK-47755][CONNECT] Pivot should fail when the number 
of distinct values is too large

No new revisions were added by this update.

Summary of changes:
 .../sql/connect/planner/SparkConnectPlanner.scala  | 23 +++
 python/pyspark/sql/tests/test_group.py |  5 +++
 .../spark/sql/RelationalGroupedDataset.scala   | 47 --
 3 files changed, 36 insertions(+), 39 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



Re: [PR] Fix version dropdown issue in version 3.5.1 of pyspark document [spark-website]

2024-04-07 Thread via GitHub


HeartSaVioR commented on PR #507:
URL: https://github.com/apache/spark-website/pull/507#issuecomment-2041626496

   @panbingkun 
   Sorry to visit this too lately. Definitely we need to fix the version in 
3.5.1. It'd be awesome if you could submit a PR for the fix. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47734][PYTHON][TESTS][3.4] Fix flaky DataFrame.writeStream doctest by stopping streaming query

2024-04-07 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 1f66a40e3b85 [SPARK-47734][PYTHON][TESTS][3.4] Fix flaky 
DataFrame.writeStream doctest by stopping streaming query
1f66a40e3b85 is described below

commit 1f66a40e3b85ea2153c021d65be8124920091fa7
Author: Josh Rosen 
AuthorDate: Mon Apr 8 07:05:45 2024 +0900

[SPARK-47734][PYTHON][TESTS][3.4] Fix flaky DataFrame.writeStream doctest 
by stopping streaming query

### What changes were proposed in this pull request?

Backport of https://github.com/apache/spark/pull/45885.

This PR deflakes the `pyspark.sql.dataframe.DataFrame.writeStream` doctest.

PR https://github.com/apache/spark/pull/45298 aimed to fix that test but 
misdiagnosed the root issue. The problem is not that concurrent tests were 
colliding on a temporary directory. Rather, the issue is specific to the 
`DataFrame.writeStream` test's logic: that test is starting a streaming query 
that writes files to the temporary directory, the exits the temp directory 
context manager without first stopping the streaming query. That creates a race 
condition where the context manager [...]

```
File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line ?, in 
pyspark.sql.dataframe.DataFrame.writeStream
Failed example:
with tempfile.TemporaryDirectory() as d:
# Create a table with Rate source.
df.writeStream.toTable(
"my_table", checkpointLocation=d)
Exception raised:
Traceback (most recent call last):
  File "/usr/lib/python3.11/doctest.py", line 1353, in __run
exec(compile(example.source, filename, "single",
  File "", line 
1, in 
with tempfile.TemporaryDirectory() as d:
  File "/usr/lib/python3.11/tempfile.py", line 1043, in __exit__
self.cleanup()
  File "/usr/lib/python3.11/tempfile.py", line 1047, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "/usr/lib/python3.11/tempfile.py", line 1029, in _rmtree
_rmtree(name, onerror=onerror)
  File "/usr/lib/python3.11/shutil.py", line 738, in rmtree
onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib/python3.11/shutil.py", line 736, in rmtree
os.rmdir(path, dir_fd=dir_fd)
OSError: [Errno 39] Directory not empty: 
'/__w/spark/spark/python/target/4f062b09-213f-4ac2-a10a-2d704990141b/tmp29irqweq'
```

In this PR, I update the doctest to properly stop the streaming query.

### Why are the changes needed?

Fix flaky test.

### Does this PR introduce _any_ user-facing change?

No, test-only. Small user-facing doc change, but one that is consistent 
with other doctest examples.

### How was this patch tested?

Manually ran updated test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45908 from JoshRosen/fix-flaky-writestream-doctest-3.4.

Authored-by: Josh Rosen 
Signed-off-by: Jungtaek Lim 
---
 python/pyspark/sql/dataframe.py | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 14426c514392..f69d74ad5002 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -527,6 +527,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
 Examples
 
+>>> import time
 >>> import tempfile
 >>> df = spark.readStream.format("rate").load()
 >>> type(df.writeStream)
@@ -534,9 +535,10 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
 >>> with tempfile.TemporaryDirectory() as d:
 ... # Create a table with Rate source.
-... df.writeStream.toTable(
-... "my_table", checkpointLocation=d) # doctest: +ELLIPSIS
-
+... query = df.writeStream.toTable(
+... "my_table", checkpointLocation=d)
+... time.sleep(3)
+... query.stop()
 """
 return DataStreamWriter(self)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47744] Add support for negative-valued bytes in range encoder

2024-04-07 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e92e8f5441a7 [SPARK-47744] Add support for negative-valued bytes in 
range encoder
e92e8f5441a7 is described below

commit e92e8f5441a702021e3cbcb282c172f6697f7118
Author: Neil Ramaswamy 
AuthorDate: Mon Apr 8 05:48:51 2024 +0900

[SPARK-47744] Add support for negative-valued bytes in range encoder

### What changes were proposed in this pull request?

The RocksDBStateEncoder now encodes negative-valued bytes correctly.

### Why are the changes needed?

Components that use the state encoder might want to use the full-range of 
values of the Scala (signed) `Byte` type.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UT was modified. All existing UTs should pass.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45906 from neilramaswamy/spark-47744.

Authored-by: Neil Ramaswamy 
Signed-off-by: Jungtaek Lim 
---
 .../sql/execution/streaming/state/RocksDBStateEncoder.scala| 10 --
 .../sql/execution/streaming/state/RocksDBStateStoreSuite.scala |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala
index 06c3940af127..e9b910a76148 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala
@@ -323,8 +323,14 @@ class RangeKeyScanStateEncoder(
 field.dataType match {
   case BooleanType =>
   case ByteType =>
-bbuf.put(positiveValMarker)
-bbuf.put(value.asInstanceOf[Byte])
+val byteVal = value.asInstanceOf[Byte]
+val signCol = if (byteVal < 0) {
+  negativeValMarker
+} else {
+  positiveValMarker
+}
+bbuf.put(signCol)
+bbuf.put(byteVal)
 writer.write(idx, bbuf.array())
 
   case ShortType =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala
index 1e5f664c980c..16a5935e04f4 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala
@@ -625,7 +625,7 @@ class RocksDBStateStoreSuite extends 
StateStoreSuiteBase[RocksDBStateStoreProvid
   val timerTimestamps: Seq[(Byte, Int)] = Seq((0x33, 10), (0x1A, 40), 
(0x1F, 1), (0x01, 68),
 (0x7F, 2000), (0x01, 27), (0x01, 394), (0x01, 5), (0x03, 980), (0x35, 
2112),
 (0x11, -190), (0x1A, -69), (0x01, -344245), (0x31, -901),
-(0x06, 90118), (0x09, 95118), (0x06, 87210))
+(-0x01, 90118), (-0x7F, 95118), (-0x80, 87210))
   timerTimestamps.foreach { ts =>
 // order by byte col first and then by int col
 val keyRow = schemaProj.apply(new GenericInternalRow(Array[Any](ts._1, 
ts._2,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (d7430124191a -> f7dff4aa0c8f)

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d7430124191a [SPARK-47753][PYTHON][CONNECT][TESTS] Make 
pyspark.testing compatible with pyspark-connect
 add f7dff4aa0c8f [SPARK-47752][PS][CONNECT] Make pyspark.pandas compatible 
with pyspark-connect

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/plot/core.py   |  6 --
 python/pyspark/pandas/spark/functions.py | 21 -
 2 files changed, 24 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47753][PYTHON][CONNECT][TESTS] Make pyspark.testing compatible with pyspark-connect

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d7430124191a [SPARK-47753][PYTHON][CONNECT][TESTS] Make 
pyspark.testing compatible with pyspark-connect
d7430124191a is described below

commit d7430124191ab1f010b2ac873dbbeee5ff9caf52
Author: Hyukjin Kwon 
AuthorDate: Sun Apr 7 18:34:30 2024 +0900

[SPARK-47753][PYTHON][CONNECT][TESTS] Make pyspark.testing compatible with 
pyspark-connect

### What changes were proposed in this pull request?

This PR proposes to make `pyspark.testing` compatible with 
`pyspark-connect` by using noop context manager `contextlib.nullcontext` 
instead of `QuietTest` which requires JVM access.

### Why are the changes needed?

In order for `pyspark-connect` to work without classic PySpark packages and 
dependencies. Also, the logs are hidden as it's written to the separate file so 
it is actually already quiet.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Yes, at https://github.com/apache/spark/pull/45870. Once CI is setup there, 
it will be tested there properly.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45916 from HyukjinKwon/SPARK-47753.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/testing/connectutils.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/python/pyspark/testing/connectutils.py 
b/python/pyspark/testing/connectutils.py
index 5cb553c4949a..191505741eb4 100644
--- a/python/pyspark/testing/connectutils.py
+++ b/python/pyspark/testing/connectutils.py
@@ -21,6 +21,7 @@ import os
 import functools
 import unittest
 import uuid
+import contextlib
 
 grpc_requirement_message = None
 try:
@@ -208,3 +209,5 @@ class ReusedConnectTestCase(unittest.TestCase, 
SQLTestUtils, PySparkErrorTestUti
 
 if self._legacy_sc is not None:
 return QuietTest(self._legacy_sc)
+else:
+return contextlib.nullcontext()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47751][PYTHON][CONNECT] Make pyspark.worker_utils compatible with pyspark-connect

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c11585ac296e [SPARK-47751][PYTHON][CONNECT] Make pyspark.worker_utils 
compatible with pyspark-connect
c11585ac296e is described below

commit c11585ac296eb726e6356bfcc7628a2c948e1d2f
Author: Hyukjin Kwon 
AuthorDate: Sun Apr 7 18:11:12 2024 +0900

[SPARK-47751][PYTHON][CONNECT] Make pyspark.worker_utils compatible with 
pyspark-connect

### What changes were proposed in this pull request?

This PR proposes to make `pyspark.worker_utils` compatible with 
`pyspark-connect`.

### Why are the changes needed?

In order for `pyspark-connect` to work without classic PySpark packages and 
dependencies.
Spark Connect does not support `Broadcast` and `Accumulator`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Yes, at https://github.com/apache/spark/pull/45870. Once CI is setup there, 
it will be tested there properly.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45914 from HyukjinKwon/SPARK-47751.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/worker_util.py | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/python/pyspark/worker_util.py b/python/pyspark/worker_util.py
index f3c59c91ea2c..22389decac2f 100644
--- a/python/pyspark/worker_util.py
+++ b/python/pyspark/worker_util.py
@@ -32,10 +32,8 @@ try:
 except ImportError:
 has_resource_module = False
 
-from pyspark.accumulators import _accumulatorRegistry
-from pyspark.core.broadcast import Broadcast, _broadcastRegistry
+from pyspark.util import is_remote_only
 from pyspark.errors import PySparkRuntimeError
-from pyspark.core.files import SparkFiles
 from pyspark.util import local_connect_and_auth
 from pyspark.serializers import (
 read_bool,
@@ -59,8 +57,11 @@ def add_path(path: str) -> None:
 
 
 def read_command(serializer: FramedSerializer, file: IO) -> Any:
+if not is_remote_only():
+from pyspark.core.broadcast import Broadcast
+
 command = serializer._read_with_length(file)
-if isinstance(command, Broadcast):
+if not is_remote_only() and isinstance(command, Broadcast):
 command = serializer.loads(command.value)
 return command
 
@@ -125,8 +126,12 @@ def setup_spark_files(infile: IO) -> None:
 """
 # fetch name of workdir
 spark_files_dir = utf8_deserializer.loads(infile)
-SparkFiles._root_directory = spark_files_dir
-SparkFiles._is_running_on_worker = True
+
+if not is_remote_only():
+from pyspark.core.files import SparkFiles
+
+SparkFiles._root_directory = spark_files_dir
+SparkFiles._is_running_on_worker = True
 
 # fetch names of includes (*.zip and *.egg files) and construct PYTHONPATH
 add_path(spark_files_dir)  # *.py files that were added will be copied here
@@ -142,6 +147,9 @@ def setup_broadcasts(infile: IO) -> None:
 """
 Set up broadcasted variables.
 """
+if not is_remote_only():
+from pyspark.core.broadcast import Broadcast, _broadcastRegistry
+
 # fetch names and values of broadcast variables
 needs_broadcast_decryption_server = read_bool(infile)
 num_broadcast_variables = read_int(infile)
@@ -175,6 +183,11 @@ def send_accumulator_updates(outfile: IO) -> None:
 """
 Send the accumulator updates back to JVM.
 """
-write_int(len(_accumulatorRegistry), outfile)
-for aid, accum in _accumulatorRegistry.items():
-pickleSer._write_with_length((aid, accum._value), outfile)
+if not is_remote_only():
+from pyspark.accumulators import _accumulatorRegistry
+
+write_int(len(_accumulatorRegistry), outfile)
+for aid, accum in _accumulatorRegistry.items():
+pickleSer._write_with_length((aid, accum._value), outfile)
+else:
+write_int(0, outfile)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (644687b66e1a -> 4d9dbb35aacb)

2024-04-07 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 644687b66e1a [SPARK-47709][BUILD] Upgrade tink to 1.13.0
 add 4d9dbb35aacb [SPARK-46722][CONNECT][SS][TESTS][FOLLOW-UP] Drop the 
tables after tests finished

No new revisions were added by this update.

Summary of changes:
 .../sql/tests/connect/streaming/test_parity_listener.py  | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org