(spark) branch branch-3.5 updated: [SPARK-47435][SPARK-45561][SQL][3.5] Fix overflow issue of MySQL UNSIGNED TINYINT caused by

2024-03-18 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 3857c16be81f [SPARK-47435][SPARK-45561][SQL][3.5] Fix overflow issue 
of MySQL UNSIGNED TINYINT caused by
3857c16be81f is described below

commit 3857c16be81f568cc5caf81f65941109bf5f2939
Author: Kent Yao 
AuthorDate: Tue Mar 19 13:37:28 2024 +0800

[SPARK-47435][SPARK-45561][SQL][3.5] Fix overflow issue of MySQL UNSIGNED 
TINYINT caused by

### What changes were proposed in this pull request?

SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, 
which caused unsigned TINYINT overflow. As regardless of signed or unsigned 
types, the TINYINT is used for java.sql.Types.
In this PR, we put the signed info into the metadata for mapping TINYINT to 
short or byte.

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

Uses can read MySQL UNSIGNED TINYINT values after this PR like versions 
before 3.5.0 which has breaked since 3.5.1

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #45579 from yaooqinn/SPARK-47435-B.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  9 ++--
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|  9 ++--
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |  6 ++-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 14 --
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |  9 ++--
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |  9 ++--
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 28 +++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  6 +--
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 10 ++--
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 54 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 26 +++
 11 files changed, 115 insertions(+), 65 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 20fdc965874f..dcf4225d522d 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -56,10 +56,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits 
BIT(10), "
   + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci 
DECIMAL(40,20), flt FLOAT, "
-  + "dbl DOUBLE, tiny TINYINT)").executeUpdate()
+  + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate()
+
 conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', "
   + "17, 7, 123456789, 123456789012345, 
123456789012345.123456789012345, "
-  + "42.75, 1.0002, -128)").executeUpdate()
+  + "42.75, 1.0002, -128, 255)").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -89,7 +90,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val rows = df.collect()
 assert(rows.length == 1)
 val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 10)
+assert(types.length == 11)
 assert(types(0).equals("class java.lang.Boolean"))
 assert(types(1).equals("class java.lang.Long"))
 assert(types(2).equals("class java.lang.Integer"))
@@ -100,6 +101,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(types(7).equals("class java.lang.Double"))
 assert(types(8).equals("class java.lang.Double"))
 assert(types(9).equals("class java.lang.Byte"))
+assert(types(10).equals("class java.lang.Short"))
 assert(rows(0).getBoolean(0) == false)
 assert(rows(0).getLong(1) == 0x225)
 assert(rows(0).getInt(2) == 17)
@@ -111,6 +113,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows(0).getDouble(7) == 42.75)
 assert(rows(0).getDouble(8) == 1.0002)
 assert(rows(0).getByte(9) == 0x80.toByte)
+assert(rows(0).getShort(10) == 0xff.toShort)
   }
 
   test("Date types") {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
index 661b1277e9f0..9a78244f5326 

(spark) branch master updated (681b41f0808e -> 5f48931fcdf7)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 681b41f0808e [SPARK-47422][SQL] Support collated strings in array 
operations
 add 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL 
docker image version to 8.3.0

No new revisions were added by this update.

Summary of changes:
 ...baseOnDocker.scala => MySQLDatabaseOnDocker.scala} | 17 +++--
 .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 ---
 .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala   | 19 ---
 4 files changed, 18 insertions(+), 52 deletions(-)
 copy 
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{DB2DatabaseOnDocker.scala
 => MySQLDatabaseOnDocker.scala} (66%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (e01ed0da22f2 -> 681b41f0808e)

2024-03-18 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML 
within XmlFunctionsSuite
 add 681b41f0808e [SPARK-47422][SQL] Support collated strings in array 
operations

No new revisions were added by this update.

Summary of changes:
 .../expressions/InterpretedUnsafeProjection.scala  |  2 +-
 .../apache/spark/sql/catalyst/util/TypeUtils.scala |  1 +
 .../expressions/CollationExpressionSuite.scala | 84 ++
 .../sql-tests/analyzer-results/collations.sql.out  | 35 +
 .../test/resources/sql-tests/inputs/collations.sql |  7 ++
 .../resources/sql-tests/results/collations.sql.out | 40 +++
 6 files changed, 168 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (9f8147c2a8d2 -> e01ed0da22f2)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe 
while using foreachbatch and stateful streaming query to prevent state from 
being re-loaded in each batch
 add e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML 
within XmlFunctionsSuite

No new revisions were added by this update.

Summary of changes:
 sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (acf17fd67217 -> 9f8147c2a8d2)

2024-03-18 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub 
Action job
 add 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe 
while using foreachbatch and stateful streaming query to prevent state from 
being re-loaded in each batch

No new revisions were added by this update.

Summary of changes:
 docs/structured-streaming-programming-guide.md |  6 +-
 .../sql/execution/streaming/sources/ForeachBatchSink.scala | 10 ++
 2 files changed, 15 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (cb20fcae951d -> acf17fd67217)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cb20fcae951d [SPARK-47448][CORE] Enable 
`spark.shuffle.service.removeShuffle` by default
 add acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub 
Action job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_sparkr_window.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (51e8634a5883 -> cb20fcae951d)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the 
SparkSession is the same
 add cb20fcae951d [SPARK-47448][CORE] Enable 
`spark.shuffle.service.removeShuffle` by default

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala   | 1 +
 docs/configuration.md  | 2 +-
 docs/core-migration-guide.md   | 2 ++
 4 files changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same

2024-03-18 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the 
SparkSession is the same
51e8634a5883 is described below

commit 51e8634a5883f1816bb82c19b6e91c3516eee6c4
Author: Nemanja Boric 
AuthorDate: Mon Mar 18 15:44:29 2024 -0400

[SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is 
the same

### What changes were proposed in this pull request?

In this PR we change the client behaviour to send the previously observed 
server session id so that the server can validate that the client used to talk 
with this specific session. Previously this was only validated on the client 
side which made the server actually execute the request for the wrong session 
before throwing on the client side (once the response from the server was 
obtained).

### Why are the changes needed?
The server can execute the client command on the wrong spark session before 
client figuring out it's the different session.

### Does this PR introduce _any_ user-facing change?
The error message now pops up differently (it used to be a slightly 
different message when validated on the client).

### How was this patch tested?
Existing unit tests, add new unit test, e2e test added, manual testing

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45499 from nemanja-boric-databricks/workspace.

Authored-by: Nemanja Boric 
Signed-off-by: Herman van Hovell 
---
 .../src/main/resources/error/error-classes.json|   5 +
 .../src/main/protobuf/spark/connect/base.proto |  54 
 .../sql/connect/client/ResponseValidator.scala |  16 +-
 .../sql/connect/client/SparkConnectClient.scala|  45 ++-
 .../service/SparkConnectAddArtifactsHandler.scala  |   7 +-
 .../service/SparkConnectAnalyzeHandler.scala   |   7 +-
 .../SparkConnectArtifactStatusesHandler.scala  |  18 +-
 .../service/SparkConnectConfigHandler.scala|   9 +-
 .../service/SparkConnectExecutionManager.scala |   9 +-
 .../SparkConnectFetchErrorDetailsHandler.scala |   6 +-
 .../service/SparkConnectInterruptHandler.scala |   6 +-
 .../SparkConnectReattachExecuteHandler.scala   |   8 +-
 .../SparkConnectReleaseExecuteHandler.scala|   4 +-
 .../SparkConnectReleaseSessionHandler.scala|   2 +
 .../sql/connect/service/SparkConnectService.scala  |   9 +-
 .../service/SparkConnectSessionManager.scala   |  29 +-
 .../execution/ReattachableExecuteSuite.scala   |   8 +-
 .../connect/planner/SparkConnectServiceSuite.scala |   8 +-
 .../service/ArtifactStatusesHandlerSuite.scala |   1 +
 .../service/FetchErrorDetailsHandlerSuite.scala|  12 +-
 .../service/SparkConnectServiceE2ESuite.scala  |  20 ++
 .../service/SparkConnectSessionManagerSuite.scala  |  38 ++-
 ...-error-conditions-invalid-handle-error-class.md |   4 +
 python/pyspark/sql/connect/client/core.py  |  12 +
 python/pyspark/sql/connect/proto/base_pb2.py   | 328 ++---
 python/pyspark/sql/connect/proto/base_pb2.pyi  | 210 +
 26 files changed, 655 insertions(+), 220 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 9362b8342abf..b5a0089fb2c8 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2083,6 +2083,11 @@
   "Operation not found."
 ]
   },
+  "SESSION_CHANGED" : {
+"message" : [
+  "The existing Spark server driver instance has restarted. Please 
reconnect."
+]
+  },
   "SESSION_CLOSED" : {
 "message" : [
   "Session was closed."
diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/base.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/base.proto
index cb9dbe62c193..bcc7edc0 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/base.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/base.proto
@@ -66,6 +66,12 @@ message AnalyzePlanRequest {
   // The id should be an UUID string of the format 
`00112233-4455-6677-8899-aabbccddeeff`
   string session_id = 1;
 
+  // (Optional)
+  //
+  // Server-side generated idempotency key from the previous responses (if 
any). Server
+  // can use this to validate that the server side session has not changed.
+  optional string client_observed_server_side_session_id = 17;
+
   // (Required) User context
   UserContext user_context = 2;
 
@@ -281,6 +287,12 @@ message ExecutePlanRequest {
   // The id should be an UUID string of the format 

(spark) branch master updated: [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before 
`removeBlockInternal`
a40940a0bc6d is described below

commit a40940a0bc6de58b5c56b8ad918f338c6e70572f
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 18 12:39:44 2024 -0700

[SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`

### What changes were proposed in this pull request?

This PR aims to make `BlockManager` warn before invoking 
`removeBlockInternal` by switching the log position. To be clear,
1. For the case where `removeBlockInternal` succeeds, the log messages are 
identical before and after this PR.
2. For the case where `removeBlockInternal` fails, the user will see one 
additional warning message like the following which was hidden from the users 
before this PR.
```
logWarning(s"Putting block $blockId failed")
```

### Why are the changes needed?

When `Put` operation fails, Apache Spark currently tries 
`removeBlockInternal` first before logging.


https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567

On top of that, if `removeBlockInternal` fails consecutively, Spark shows 
the warning like the following and fails the job.
```
24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due 
to exception java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e.
24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed 
normally.
24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0
24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 0: Stage cancelled
24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at 
SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: 
Task serialization failed: java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
```

It's misleading although they might share the same root cause. Since `Put` 
operation fails before the above failure, we had better switch WARN message to 
make it clear.

### Does this PR introduce _any_ user-facing change?

No. This is a warning message change only.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45570 from dongjoon-hyun/SPARK-47446.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 228ec5752e1b..89b3914e94af 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -1561,8 +1561,8 @@ private[spark] class BlockManager(
   blockInfoManager.unlock(blockId)
 }
   } else {
-removeBlockInternal(blockId, tellMaster = false)
 logWarning(s"Putting block $blockId failed")
+removeBlockInternal(blockId, tellMaster = false)
   }
   res
 } catch {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ce93c9fd8671 [SPARK-47383][CORE] Support `spark.shutdown.timeout` 
config
ce93c9fd8671 is described below

commit ce93c9fd86715e2479552628398f6fc11e83b2af
Author: Rob Reeves 
AuthorDate: Mon Mar 18 10:36:38 2024 -0700

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config

### What changes were proposed in this pull request?
Make the shutdown hook timeout configurable. If this is not defined it 
falls back to the existing behavior, which uses a default timeout of 30 
seconds, or whatever is defined in core-site.xml for the 
hadoop.service.shutdown.timeout property.

### Why are the changes needed?
Spark sometimes times out during the shutdown process. This can result in 
data left in the queues to be dropped and causes metadata loss (e.g. event 
logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying 
`org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 
seconds.  It can be configured by setting hadoop.service.shutdown.timeout, but 
this must be done in the core-site.xml/core-default.xml because a new hadoop 
conf object is created and there is no opportunity to modify it.

### Does this PR introduce _any_ user-facing change?
Yes, a new config `spark.shutdown.timeout` is added.

### How was this patch tested?
Manual testing in spark-shell. This behavior is not practical to write a 
unit test for.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45504 from robreeves/sc_shutdown_timeout.

Authored-by: Rob Reeves 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/internal/config/package.scala| 10 ++
 .../org/apache/spark/util/ShutdownHookManager.scala   | 19 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index aa240b5cc5b5..e72b9cb694eb 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -2683,4 +2683,14 @@ package object config {
   .version("4.0.0")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+ConfigBuilder("spark.shutdown.timeout")
+  .internal()
+  .doc("Defines the timeout period to wait for all shutdown hooks to be 
executed. " +
+"This must be passed as a system property argument in the Java 
options, for example " +
+"spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".")
+  .version("4.0.0")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createOptional
 }
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 4db268604a3e..c6cad9440168 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -19,12 +19,16 @@ package org.apache.spark.util
 
 import java.io.File
 import java.util.PriorityQueue
+import java.util.concurrent.TimeUnit
 
 import scala.util.Try
 
 import org.apache.hadoop.fs.FileSystem
 
+import org.apache.spark.SparkConf
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.SPARK_SHUTDOWN_TIMEOUT_MS
+
 
 /**
  * Various utility methods used by Spark.
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
 val hookTask = new Runnable() {
   override def run(): Unit = runAll()
 }
-org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-  hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+// The timeout property must be passed as a Java system property because 
this
+// is initialized before Spark configurations are registered as system
+// properties later in initialization.
+val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS)
+
+timeout.fold {
+  org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+hookTask, priority)
+} { t =>
+  org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+hookTask, priority, t, TimeUnit.MILLISECONDS)
+}
   }
 
   def runAll(): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bd42cbdb6bf [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED 
TINYINT  caused by SPARK-45561
8bd42cbdb6bf is described below

commit 8bd42cbdb6bfa40aead94570b06e926f8e8aa9e1
Author: Kent Yao 
AuthorDate: Mon Mar 18 08:56:55 2024 -0700

[SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT  caused by 
SPARK-45561

### What changes were proposed in this pull request?

SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, 
which caused unsigned TINYINT overflow. As regardless of signed or unsigned 
types, the TINYINT is used for java.sql.Types.

In this PR, we put the signed info into the metadata for mapping TINYINT to 
short or byte.

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

Uses can read MySQL UNSIGNED TINYINT values after this PR like versions 
before 3.5.0 which has breaked since 3.5.1

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45556 from yaooqinn/SPARK-47435.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  9 ++--
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|  9 ++--
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |  6 ++-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 15 --
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |  9 ++--
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |  9 ++--
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 26 ++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  5 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 10 ++--
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 60 --
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 24 +
 11 files changed, 114 insertions(+), 68 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index b1d239337aa0..79e88f109534 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -57,10 +57,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits 
BIT(10), "
   + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci 
DECIMAL(40,20), flt FLOAT, "
-  + "dbl DOUBLE, tiny TINYINT)").executeUpdate()
+  + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate()
+
 conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', "
   + "17, 7, 123456789, 123456789012345, 
123456789012345.123456789012345, "
-  + "42.75, 1.0002, -128)").executeUpdate()
+  + "42.75, 1.0002, -128, 255)").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -90,7 +91,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val rows = df.collect()
 assert(rows.length == 1)
 val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 10)
+assert(types.length == 11)
 assert(types(0).equals("class java.lang.Boolean"))
 assert(types(1).equals("class java.lang.Long"))
 assert(types(2).equals("class java.lang.Integer"))
@@ -101,6 +102,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(types(7).equals("class java.lang.Double"))
 assert(types(8).equals("class java.lang.Double"))
 assert(types(9).equals("class java.lang.Byte"))
+assert(types(10).equals("class java.lang.Short"))
 assert(rows(0).getBoolean(0) == false)
 assert(rows(0).getLong(1) == 0x225)
 assert(rows(0).getInt(2) == 17)
@@ -112,6 +114,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows(0).getDouble(7) == 42.75)
 assert(rows(0).getDouble(8) == 1.0002)
 assert(rows(0).getByte(9) == 0x80.toByte)
+assert(rows(0).getShort(10) == 0xff.toShort)
   }
 
   test("Date types") {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
index c3ec7e1925fa..6c1b7fdd1be5 100644
--- 

(spark) branch master updated (4dc362dbc6c0 -> 1aafe60b3e76)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
 add 1aafe60b3e76 [SPARK-47442][CORE][TEST] Use port 0 to start worker 
servers in MasterSuite

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/deploy/master/MasterSuiteBase.scala| 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47438][BUILD] Upgrade jackson to 2.17.0

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
4dc362dbc6c0 is described below

commit 4dc362dbc6c039d955e4dceb87e53dfc76ef2a5c
Author: panbingkun 
AuthorDate: Mon Mar 18 08:25:16 2024 -0700

[SPARK-47438][BUILD] Upgrade jackson to 2.17.0

### What changes were proposed in this pull request?
The pr aims to upgrade  jackson from `2.16.1` to `2.17.0`.

### Why are the changes needed?
The full release notes: 
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45562 from panbingkun/SPARK-47438.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++---
 pom.xml   |  4 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index d4b7d38aea22..86da61d89149 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -103,15 +103,15 @@ icu4j/72.1//icu4j-72.1.jar
 ini4j/0.5.4//ini4j-0.5.4.jar
 istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
 ivy/2.5.2//ivy-2.5.2.jar
-jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar
+jackson-annotations/2.17.0//jackson-annotations-2.17.0.jar
 jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar
-jackson-core/2.16.1//jackson-core-2.16.1.jar
-jackson-databind/2.16.1//jackson-databind-2.16.1.jar
-jackson-dataformat-cbor/2.16.1//jackson-dataformat-cbor-2.16.1.jar
-jackson-dataformat-yaml/2.16.1//jackson-dataformat-yaml-2.16.1.jar
-jackson-datatype-jsr310/2.16.1//jackson-datatype-jsr310-2.16.1.jar
+jackson-core/2.17.0//jackson-core-2.17.0.jar
+jackson-databind/2.17.0//jackson-databind-2.17.0.jar
+jackson-dataformat-cbor/2.17.0//jackson-dataformat-cbor-2.17.0.jar
+jackson-dataformat-yaml/2.17.0//jackson-dataformat-yaml-2.17.0.jar
+jackson-datatype-jsr310/2.17.0//jackson-datatype-jsr310-2.17.0.jar
 jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
-jackson-module-scala_2.13/2.16.1//jackson-module-scala_2.13-2.16.1.jar
+jackson-module-scala_2.13/2.17.0//jackson-module-scala_2.13-2.17.0.jar
 jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar
 jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar
 jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar
diff --git a/pom.xml b/pom.xml
index 757d911c1229..5cc56a92999d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -184,8 +184,8 @@
 true
 true
 1.9.13
-2.16.1
-
2.16.1
+2.17.0
+
2.17.0
 2.3.1
 3.0.2
 1.1.10.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57424b92c5b5 [MINOR][DOCS] Add `Web UI` link to `Other Documents` 
section of index.md
57424b92c5b5 is described below

commit 57424b92c5b5e7c3de680a7d8a6b137911f45666
Author: Matt Braymer-Hayes 
AuthorDate: Mon Mar 18 07:53:11 2024 -0700

[MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md

### What changes were proposed in this pull request?

Adds the Web UI to the `Other Documents` list on the main page.

### Why are the changes needed?

I found it difficult to find the Web UI docs: it's only linked inside the 
Monitoring docs. Adding it to the main page will make it easier for people to 
find and use the docs.

### Does this PR introduce _any_ user-facing change?

Yes: adds another cross-reference on the main page.

### How was this patch tested?

Visually verified that Markdown still rendered properly.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45534 from mattayes/patch-2.

Authored-by: Matt Braymer-Hayes 
Signed-off-by: Dongjoon Hyun 
---
 docs/index.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/index.md b/docs/index.md
index 5f3858bec86b..12c53c40c8f7 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -138,6 +138,7 @@ options for deployment:
 
 * [Configuration](configuration.html): customize Spark via its configuration 
system
 * [Monitoring](monitoring.html): track the behavior of your applications
+* [Web UI](web-ui.html): view useful information about your applications
 * [Tuning Guide](tuning.html): best practices to optimize performance and 
memory use
 * [Job Scheduling](job-scheduling.html): scheduling resources across and 
within Spark applications
 * [Security](security.html): Spark security support
@@ -145,7 +146,7 @@ options for deployment:
 * Integration with other storage systems:
   * [Cloud Infrastructures](cloud-integration.html)
   * [OpenStack Swift](storage-openstack-swift.html)
-* [Migration Guide](migration-guide.html): Migration guides for Spark 
components
+* [Migration Guide](migration-guide.html): migration guides for Spark 
components
 * [Building Spark](building-spark.html): build Spark using the Maven system
 * [Contributing to Spark](https://spark.apache.org/contributing.html)
 * [Third Party Projects](https://spark.apache.org/third-party-projects.html): 
related third party Spark projects


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 7a899e219f5a [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`
7a899e219f5a is described below

commit 7a899e219f5a17ab12aeb8d67738025b7e2b9d9c
Author: Huw Campbell 
AuthorDate: Mon Mar 18 07:38:10 2024 -0700

[SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when 
one is using proxy settings. Change the generated link to be consistent with 
other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and 
could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this 
patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3;>

Goes correctly to
https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5;>

Before it would redirect and we'd get a 404.

https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef;>

Closes #45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
index 7cd7db4088ac..ce3e7cde01b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
@@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable(
 
   override def row(query: StructuredStreamingRow): Seq[Node] = {
 val streamingQuery = query.streamingUIData
-val statisticsLink = "%s/%s/statistics?id=%s"
+val statisticsLink = "%s/%s/statistics/?id=%s"
   .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix,
 streamingQuery.summary.runId)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new bb7a6138b827 [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`
bb7a6138b827 is described below

commit bb7a6138b827975fc827813ab42a2b9074bf8d5e
Author: Huw Campbell 
AuthorDate: Mon Mar 18 07:38:10 2024 -0700

[SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when 
one is using proxy settings. Change the generated link to be consistent with 
other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and 
could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this 
patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3;>

Goes correctly to
https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5;>

Before it would redirect and we'd get a 404.

https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef;>

Closes #45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
index 7cd7db4088ac..ce3e7cde01b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
@@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable(
 
   override def row(query: StructuredStreamingRow): Seq[Node] = {
 val streamingQuery = query.streamingUIData
-val statisticsLink = "%s/%s/statistics?id=%s"
+val statisticsLink = "%s/%s/statistics/?id=%s"
   .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix,
 streamingQuery.summary.runId)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (d3f12df6e09e -> 9b466d329c3c)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class 
for `DataFrame.sort*`
 add 9b466d329c3c [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*`

2024-03-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class 
for `DataFrame.sort*`
d3f12df6e09e is described below

commit d3f12df6e09ee47dbd7c9e08c9962430a2601941
Author: Ruifeng Zheng 
AuthorDate: Mon Mar 18 20:24:43 2024 +0900

[SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*`

### What changes were proposed in this pull request?
Correct the error class for `DataFrame.sort*`

### Why are the changes needed?
`DataFrame.sort*` support negative indices, which means `sort by desc`

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45559 from zhengruifeng/correct_index_error.

Authored-by: Ruifeng Zheng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py  | 5 +
 python/pyspark/sql/connect/dataframe.py | 3 ++-
 python/pyspark/sql/dataframe.py | 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 1e21ad3543e9..6b7f19b44918 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -1162,6 +1162,11 @@ ERROR_CLASSES_JSON = '''
 "message": [
   "Function `` should take at least  columns."
 ]
+  },
+  "ZERO_INDEX": {
+"message": [
+  "Index must be non-zero."
+]
   }
 }
 '''
diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index bbb1be4c4724..f300774b6e57 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -701,7 +701,8 @@ class DataFrame:
 _c = self[-c - 1].desc()
 else:
 raise PySparkIndexError(
-error_class="INDEX_NOT_POSITIVE", 
message_parameters={"index": str(c)}
+error_class="ZERO_INDEX",
+message_parameters={},
 )
 else:
 _c = c  # type: ignore[assignment]
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 5c5c263b8156..d04c35dac5e9 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -3331,7 +3331,6 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 jcols = []
 for c in cols:
 if isinstance(c, int) and not isinstance(c, bool):
-# TODO: should introduce dedicated error class
 # ordinal is 1-based
 if c > 0:
 _c = self[c - 1]
@@ -3340,7 +3339,8 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 _c = self[-c - 1].desc()
 else:
 raise PySparkIndexError(
-error_class="INDEX_NOT_POSITIVE", 
message_parameters={"index": str(c)}
+error_class="ZERO_INDEX",
+message_parameters={},
 )
 else:
 _c = c  # type: ignore[assignment]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (08866c280f87 -> 42c4dad62dcd)

2024-03-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 08866c280f87 [SPARK-47439][PYTHON] Document Python Data Source API in 
API reference page
 add 42c4dad62dcd [SPARK-43435][PYTHON][CONNECT][TESTS] Reenable doctest 
`pyspark.sql.connect.dataframe.DataFrame.writeStream`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/dataframe.py | 3 ---
 1 file changed, 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47439][PYTHON] Document Python Data Source API in API reference page

2024-03-18 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 08866c280f87 [SPARK-47439][PYTHON] Document Python Data Source API in 
API reference page
08866c280f87 is described below

commit 08866c280f877ce27d5c5305c7a09add76c86774
Author: Hyukjin Kwon 
AuthorDate: Mon Mar 18 20:08:22 2024 +0900

[SPARK-47439][PYTHON] Document Python Data Source API in API reference page

### What changes were proposed in this pull request?

This PR proposes to document Python Data Source API in Python API reference 
page.

### Why are the changes needed?

For users/developers to know how to use them.

### Does this PR introduce _any_ user-facing change?

Yes, it documents Python Data Source API.

### How was this patch tested?

Manually checked the output from Python API reference build

```bash
cd python/docs
make clean html
open build/html/index.html
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45561 from HyukjinKwon/SPARK-47439.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../source/reference/pyspark.sql/core_classes.rst  |  7 
 .../{core_classes.rst => datasource.rst}   | 44 +++---
 python/docs/source/reference/pyspark.sql/index.rst |  1 +
 .../source/reference/pyspark.sql/spark_session.rst |  1 +
 4 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/python/docs/source/reference/pyspark.sql/core_classes.rst 
b/python/docs/source/reference/pyspark.sql/core_classes.rst
index 3cf19686cdd8..65096da21de5 100644
--- a/python/docs/source/reference/pyspark.sql/core_classes.rst
+++ b/python/docs/source/reference/pyspark.sql/core_classes.rst
@@ -42,3 +42,10 @@ Core Classes
 UDTFRegistration
 udf.UserDefinedFunction
 udtf.UserDefinedTableFunction
+datasource.DataSource
+datasource.DataSourceReader
+datasource.DataSourceStreamReader
+datasource.DataSourceWriter
+datasource.DataSourceRegistration
+datasource.InputPartition
+datasource.WriterCommitMessage
diff --git a/python/docs/source/reference/pyspark.sql/core_classes.rst 
b/python/docs/source/reference/pyspark.sql/datasource.rst
similarity index 58%
copy from python/docs/source/reference/pyspark.sql/core_classes.rst
copy to python/docs/source/reference/pyspark.sql/datasource.rst
index 3cf19686cdd8..b92db7a28858 100644
--- a/python/docs/source/reference/pyspark.sql/core_classes.rst
+++ b/python/docs/source/reference/pyspark.sql/datasource.rst
@@ -16,29 +16,29 @@
 under the License.
 
 
-
-Core Classes
-
-.. currentmodule:: pyspark.sql
+==
+Python Data Source
+==
+
+.. currentmodule:: pyspark.sql.datasource
 
 .. autosummary::
 :toctree: api/
 
-SparkSession
-Catalog
-DataFrame
-Column
-Observation
-Row
-GroupedData
-PandasCogroupedOps
-DataFrameNaFunctions
-DataFrameStatFunctions
-Window
-DataFrameReader
-DataFrameWriter
-DataFrameWriterV2
-UDFRegistration
-UDTFRegistration
-udf.UserDefinedFunction
-udtf.UserDefinedTableFunction
+DataSource.name
+DataSource.reader
+DataSource.schema
+DataSource.streamReader
+DataSource.writer
+DataSourceReader.partitions
+DataSourceReader.read
+DataSourceRegistration.register
+DataSourceStreamReader.commit
+DataSourceStreamReader.initialOffset
+DataSourceStreamReader.latestOffset
+DataSourceStreamReader.partitions
+DataSourceStreamReader.read
+DataSourceStreamReader.stop
+DataSourceWriter.abort
+DataSourceWriter.commit
+DataSourceWriter.write
diff --git a/python/docs/source/reference/pyspark.sql/index.rst 
b/python/docs/source/reference/pyspark.sql/index.rst
index 233c8b238a6d..9322a91fba25 100644
--- a/python/docs/source/reference/pyspark.sql/index.rst
+++ b/python/docs/source/reference/pyspark.sql/index.rst
@@ -42,3 +42,4 @@ This page gives an overview of all public Spark SQL API.
 udf
 udtf
 protobuf
+datasource
diff --git a/python/docs/source/reference/pyspark.sql/spark_session.rst 
b/python/docs/source/reference/pyspark.sql/spark_session.rst
index 4be343c52140..ea71249e292e 100644
--- a/python/docs/source/reference/pyspark.sql/spark_session.rst
+++ b/python/docs/source/reference/pyspark.sql/spark_session.rst
@@ -47,6 +47,7 @@ See also :class:`SparkSession`.
 SparkSession.catalog
 SparkSession.conf
 SparkSession.createDataFrame
+SparkSession.dataSource
 SparkSession.getActiveSession
 SparkSession.newSession
 SparkSession.profile


-
To unsubscribe, e-mail: 

(spark) branch master updated: [MINOR][TESTS] Collation - extending golden file coverage

2024-03-18 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44a88edc995e [MINOR][TESTS] Collation - extending golden file coverage
44a88edc995e is described below

commit 44a88edc995e1e09adfab80e63a409f8ced3b131
Author: Aleksandar Tomic 
AuthorDate: Mon Mar 18 13:52:48 2024 +0500

[MINOR][TESTS] Collation - extending golden file coverage

### What changes were proposed in this pull request?

This PR adds new golden file tests for collation feature:
1) DESCRIBE
3) Basic array operations
4) Removing struct test since same is already covered in golden files.

### Why are the changes needed?

Extending test coverage for collation feature.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45515 from dbatomic/collation_golden_files_update.

Authored-by: Aleksandar Tomic 
Signed-off-by: Max Gekk 
---
 .../sql-tests/analyzer-results/collations.sql.out  | 38 +-
 .../test/resources/sql-tests/inputs/collations.sql | 15 +++-
 .../resources/sql-tests/results/collations.sql.out | 45 +-
 .../org/apache/spark/sql/CollationSuite.scala  | 22 ---
 4 files changed, 92 insertions(+), 28 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out
index 6d9bb3470be6..3a0f8eec02ba 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out
@@ -37,6 +37,12 @@ InsertIntoHadoopFsRelationCommand file:[not included in 
comparison]/{warehouse_d
+- LocalRelation [col1#x, col2#x]
 
 
+-- !query
+describe table t1
+-- !query analysis
+DescribeTableCommand `spark_catalog`.`default`.`t1`, false, [col_name#x, 
data_type#x, comment#x]
+
+
 -- !query
 select count(*) from t1 group by utf8_binary
 -- !query analysis
@@ -207,7 +213,7 @@ CreateDataSourceTableCommand 
`spark_catalog`.`default`.`t1`, false
 
 
 -- !query
-INSERT INTO t1 VALUES (named_struct('utf8_binary', 'aaa', 'utf8_binary_lcase', 
'aaa'))
+insert into t1 values (named_struct('utf8_binary', 'aaa', 'utf8_binary_lcase', 
'aaa'))
 -- !query analysis
 InsertIntoHadoopFsRelationCommand file:[not included in 
comparison]/{warehouse_dir}/t1, false, Parquet, [path=file:[not included in 
comparison]/{warehouse_dir}/t1], Append, `spark_catalog`.`default`.`t1`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included 
in comparison]/{warehouse_dir}/t1), [c1]
 +- Project [named_struct(utf8_binary, col1#x.utf8_binary, utf8_binary_lcase, 
cast(col1#x.utf8_binary_lcase as string collate UTF8_BINARY_LCASE)) AS c1#x]
@@ -215,7 +221,7 @@ InsertIntoHadoopFsRelationCommand file:[not included in 
comparison]/{warehouse_d
 
 
 -- !query
-INSERT INTO t1 VALUES (named_struct('utf8_binary', 'AAA', 'utf8_binary_lcase', 
'AAA'))
+insert into t1 values (named_struct('utf8_binary', 'AAA', 'utf8_binary_lcase', 
'AAA'))
 -- !query analysis
 InsertIntoHadoopFsRelationCommand file:[not included in 
comparison]/{warehouse_dir}/t1, false, Parquet, [path=file:[not included in 
comparison]/{warehouse_dir}/t1], Append, `spark_catalog`.`default`.`t1`, 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included 
in comparison]/{warehouse_dir}/t1), [c1]
 +- Project [named_struct(utf8_binary, col1#x.utf8_binary, utf8_binary_lcase, 
cast(col1#x.utf8_binary_lcase as string collate UTF8_BINARY_LCASE)) AS c1#x]
@@ -243,3 +249,31 @@ drop table t1
 -- !query analysis
 DropTable false, false
 +- ResolvedIdentifier V2SessionCatalog(spark_catalog), default.t1
+
+
+-- !query
+select array_contains(ARRAY('aaa' collate utf8_binary_lcase),'AAA' collate 
utf8_binary_lcase)
+-- !query analysis
+Project [array_contains(array(collate(aaa, utf8_binary_lcase)), collate(AAA, 
utf8_binary_lcase)) AS array_contains(array(collate(aaa)), collate(AAA))#x]
++- OneRowRelation
+
+
+-- !query
+select array_position(ARRAY('aaa' collate utf8_binary_lcase, 'bbb' collate 
utf8_binary_lcase),'BBB' collate utf8_binary_lcase)
+-- !query analysis
+Project [array_position(array(collate(aaa, utf8_binary_lcase), collate(bbb, 
utf8_binary_lcase)), collate(BBB, utf8_binary_lcase)) AS 
array_position(array(collate(aaa), collate(bbb)), collate(BBB))#xL]
++- OneRowRelation
+
+
+-- !query
+select nullif('aaa' COLLATE utf8_binary_lcase, 'AAA' COLLATE utf8_binary_lcase)
+-- !query analysis
+Project [nullif(collate(aaa, utf8_binary_lcase), collate(AAA, 
utf8_binary_lcase)) AS nullif(collate(aaa), collate(AAA))#x]
++- OneRowRelation
+
+
+-- 

(spark) branch master updated (310dd5294267 -> 3f171ce3f43b)

2024-03-18 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 310dd5294267 [SPARK-45891][DOCS] Update README with more details
 add 3f171ce3f43b [SPARK-47436][PYTHON][DOCS] Fix docstring links and type 
hints in Python Data Source

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/datasource.py | 98 
 1 file changed, 48 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org