(spark) branch branch-3.5 updated: [SPARK-47435][SPARK-45561][SQL][3.5] Fix overflow issue of MySQL UNSIGNED TINYINT caused by
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 3857c16be81f [SPARK-47435][SPARK-45561][SQL][3.5] Fix overflow issue of MySQL UNSIGNED TINYINT caused by 3857c16be81f is described below commit 3857c16be81f568cc5caf81f65941109bf5f2939 Author: Kent Yao AuthorDate: Tue Mar 19 13:37:28 2024 +0800 [SPARK-47435][SPARK-45561][SQL][3.5] Fix overflow issue of MySQL UNSIGNED TINYINT caused by ### What changes were proposed in this pull request? SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, which caused unsigned TINYINT overflow. As regardless of signed or unsigned types, the TINYINT is used for java.sql.Types. In this PR, we put the signed info into the metadata for mapping TINYINT to short or byte. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? Uses can read MySQL UNSIGNED TINYINT values after this PR like versions before 3.5.0 which has breaked since 3.5.1 ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45579 from yaooqinn/SPARK-47435-B. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 9 ++-- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 9 ++-- .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 ++- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 14 -- .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 9 ++-- .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 9 ++-- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 28 +++ .../sql/execution/datasources/jdbc/JdbcUtils.scala | 6 +-- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 10 ++-- .../v2/jdbc/JDBCTableCatalogSuite.scala| 54 +- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 26 +++ 11 files changed, 115 insertions(+), 65 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index 20fdc965874f..dcf4225d522d 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -56,10 +56,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), " + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci DECIMAL(40,20), flt FLOAT, " - + "dbl DOUBLE, tiny TINYINT)").executeUpdate() + + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate() + conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', " + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " - + "42.75, 1.0002, -128)").executeUpdate() + + "42.75, 1.0002, -128, 255)").executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -89,7 +90,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val rows = df.collect() assert(rows.length == 1) val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 10) +assert(types.length == 11) assert(types(0).equals("class java.lang.Boolean")) assert(types(1).equals("class java.lang.Long")) assert(types(2).equals("class java.lang.Integer")) @@ -100,6 +101,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.Double")) assert(types(8).equals("class java.lang.Double")) assert(types(9).equals("class java.lang.Byte")) +assert(types(10).equals("class java.lang.Short")) assert(rows(0).getBoolean(0) == false) assert(rows(0).getLong(1) == 0x225) assert(rows(0).getInt(2) == 17) @@ -111,6 +113,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(rows(0).getDouble(7) == 42.75) assert(rows(0).getDouble(8) == 1.0002) assert(rows(0).getByte(9) == 0x80.toByte) +assert(rows(0).getShort(10) == 0xff.toShort) } test("Date types") { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index 661b1277e9f0..9a78244f5326
(spark) branch master updated (681b41f0808e -> 5f48931fcdf7)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 681b41f0808e [SPARK-47422][SQL] Support collated strings in array operations add 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL docker image version to 8.3.0 No new revisions were added by this update. Summary of changes: ...baseOnDocker.scala => MySQLDatabaseOnDocker.scala} | 17 +++-- .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++ .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 --- .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala | 19 --- 4 files changed, 18 insertions(+), 52 deletions(-) copy connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{DB2DatabaseOnDocker.scala => MySQLDatabaseOnDocker.scala} (66%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (e01ed0da22f2 -> 681b41f0808e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML within XmlFunctionsSuite add 681b41f0808e [SPARK-47422][SQL] Support collated strings in array operations No new revisions were added by this update. Summary of changes: .../expressions/InterpretedUnsafeProjection.scala | 2 +- .../apache/spark/sql/catalyst/util/TypeUtils.scala | 1 + .../expressions/CollationExpressionSuite.scala | 84 ++ .../sql-tests/analyzer-results/collations.sql.out | 35 + .../test/resources/sql-tests/inputs/collations.sql | 7 ++ .../resources/sql-tests/results/collations.sql.out | 40 +++ 6 files changed, 168 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (9f8147c2a8d2 -> e01ed0da22f2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe while using foreachbatch and stateful streaming query to prevent state from being re-loaded in each batch add e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML within XmlFunctionsSuite No new revisions were added by this update. Summary of changes: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (acf17fd67217 -> 9f8147c2a8d2)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job add 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe while using foreachbatch and stateful streaming query to prevent state from being re-loaded in each batch No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 6 +- .../sql/execution/streaming/sources/ForeachBatchSink.scala | 10 ++ 2 files changed, 15 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (cb20fcae951d -> acf17fd67217)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cb20fcae951d [SPARK-47448][CORE] Enable `spark.shuffle.service.removeShuffle` by default add acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub Action job No new revisions were added by this update. Summary of changes: .github/workflows/build_sparkr_window.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (51e8634a5883 -> cb20fcae951d)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same add cb20fcae951d [SPARK-47448][CORE] Enable `spark.shuffle.service.removeShuffle` by default No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +- core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala | 1 + docs/configuration.md | 2 +- docs/core-migration-guide.md | 2 ++ 4 files changed, 5 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same 51e8634a5883 is described below commit 51e8634a5883f1816bb82c19b6e91c3516eee6c4 Author: Nemanja Boric AuthorDate: Mon Mar 18 15:44:29 2024 -0400 [SPARK-47380][CONNECT] Ensure on the server side that the SparkSession is the same ### What changes were proposed in this pull request? In this PR we change the client behaviour to send the previously observed server session id so that the server can validate that the client used to talk with this specific session. Previously this was only validated on the client side which made the server actually execute the request for the wrong session before throwing on the client side (once the response from the server was obtained). ### Why are the changes needed? The server can execute the client command on the wrong spark session before client figuring out it's the different session. ### Does this PR introduce _any_ user-facing change? The error message now pops up differently (it used to be a slightly different message when validated on the client). ### How was this patch tested? Existing unit tests, add new unit test, e2e test added, manual testing ### Was this patch authored or co-authored using generative AI tooling? No Closes #45499 from nemanja-boric-databricks/workspace. Authored-by: Nemanja Boric Signed-off-by: Herman van Hovell --- .../src/main/resources/error/error-classes.json| 5 + .../src/main/protobuf/spark/connect/base.proto | 54 .../sql/connect/client/ResponseValidator.scala | 16 +- .../sql/connect/client/SparkConnectClient.scala| 45 ++- .../service/SparkConnectAddArtifactsHandler.scala | 7 +- .../service/SparkConnectAnalyzeHandler.scala | 7 +- .../SparkConnectArtifactStatusesHandler.scala | 18 +- .../service/SparkConnectConfigHandler.scala| 9 +- .../service/SparkConnectExecutionManager.scala | 9 +- .../SparkConnectFetchErrorDetailsHandler.scala | 6 +- .../service/SparkConnectInterruptHandler.scala | 6 +- .../SparkConnectReattachExecuteHandler.scala | 8 +- .../SparkConnectReleaseExecuteHandler.scala| 4 +- .../SparkConnectReleaseSessionHandler.scala| 2 + .../sql/connect/service/SparkConnectService.scala | 9 +- .../service/SparkConnectSessionManager.scala | 29 +- .../execution/ReattachableExecuteSuite.scala | 8 +- .../connect/planner/SparkConnectServiceSuite.scala | 8 +- .../service/ArtifactStatusesHandlerSuite.scala | 1 + .../service/FetchErrorDetailsHandlerSuite.scala| 12 +- .../service/SparkConnectServiceE2ESuite.scala | 20 ++ .../service/SparkConnectSessionManagerSuite.scala | 38 ++- ...-error-conditions-invalid-handle-error-class.md | 4 + python/pyspark/sql/connect/client/core.py | 12 + python/pyspark/sql/connect/proto/base_pb2.py | 328 ++--- python/pyspark/sql/connect/proto/base_pb2.pyi | 210 + 26 files changed, 655 insertions(+), 220 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 9362b8342abf..b5a0089fb2c8 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2083,6 +2083,11 @@ "Operation not found." ] }, + "SESSION_CHANGED" : { +"message" : [ + "The existing Spark server driver instance has restarted. Please reconnect." +] + }, "SESSION_CLOSED" : { "message" : [ "Session was closed." diff --git a/connector/connect/common/src/main/protobuf/spark/connect/base.proto b/connector/connect/common/src/main/protobuf/spark/connect/base.proto index cb9dbe62c193..bcc7edc0 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/base.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/base.proto @@ -66,6 +66,12 @@ message AnalyzePlanRequest { // The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff` string session_id = 1; + // (Optional) + // + // Server-side generated idempotency key from the previous responses (if any). Server + // can use this to validate that the server side session has not changed. + optional string client_observed_server_side_session_id = 17; + // (Required) User context UserContext user_context = 2; @@ -281,6 +287,12 @@ message ExecutePlanRequest { // The id should be an UUID string of the format
(spark) branch master updated: [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal` a40940a0bc6d is described below commit a40940a0bc6de58b5c56b8ad918f338c6e70572f Author: Dongjoon Hyun AuthorDate: Mon Mar 18 12:39:44 2024 -0700 [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal` ### What changes were proposed in this pull request? This PR aims to make `BlockManager` warn before invoking `removeBlockInternal` by switching the log position. To be clear, 1. For the case where `removeBlockInternal` succeeds, the log messages are identical before and after this PR. 2. For the case where `removeBlockInternal` fails, the user will see one additional warning message like the following which was hidden from the users before this PR. ``` logWarning(s"Putting block $blockId failed") ``` ### Why are the changes needed? When `Put` operation fails, Apache Spark currently tries `removeBlockInternal` first before logging. https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567 On top of that, if `removeBlockInternal` fails consecutively, Spark shows the warning like the following and fails the job. ``` 24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due to exception java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e. 24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed normally. 24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0 24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled 24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: Task serialization failed: java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e java.nio.file.NoSuchFileException: /data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e ``` It's misleading although they might share the same root cause. Since `Put` operation fails before the above failure, we had better switch WARN message to make it clear. ### Does this PR introduce _any_ user-facing change? No. This is a warning message change only. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45570 from dongjoon-hyun/SPARK-47446. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index 228ec5752e1b..89b3914e94af 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -1561,8 +1561,8 @@ private[spark] class BlockManager( blockInfoManager.unlock(blockId) } } else { -removeBlockInternal(blockId, tellMaster = false) logWarning(s"Putting block $blockId failed") +removeBlockInternal(blockId, tellMaster = false) } res } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ce93c9fd8671 [SPARK-47383][CORE] Support `spark.shutdown.timeout` config ce93c9fd8671 is described below commit ce93c9fd86715e2479552628398f6fc11e83b2af Author: Rob Reeves AuthorDate: Mon Mar 18 10:36:38 2024 -0700 [SPARK-47383][CORE] Support `spark.shutdown.timeout` config ### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. ### Why are the changes needed? Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. ### Does this PR introduce _any_ user-facing change? Yes, a new config `spark.shutdown.timeout` is added. ### How was this patch tested? Manual testing in spark-shell. This behavior is not practical to write a unit test for. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/internal/config/package.scala| 10 ++ .../org/apache/spark/util/ShutdownHookManager.scala | 19 +-- 2 files changed, 27 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala index aa240b5cc5b5..e72b9cb694eb 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/package.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala @@ -2683,4 +2683,14 @@ package object config { .version("4.0.0") .booleanConf .createWithDefault(false) + + private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS = +ConfigBuilder("spark.shutdown.timeout") + .internal() + .doc("Defines the timeout period to wait for all shutdown hooks to be executed. " + +"This must be passed as a system property argument in the Java options, for example " + +"spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".") + .version("4.0.0") + .timeConf(TimeUnit.MILLISECONDS) + .createOptional } diff --git a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala index 4db268604a3e..c6cad9440168 100644 --- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala +++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala @@ -19,12 +19,16 @@ package org.apache.spark.util import java.io.File import java.util.PriorityQueue +import java.util.concurrent.TimeUnit import scala.util.Try import org.apache.hadoop.fs.FileSystem +import org.apache.spark.SparkConf import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.SPARK_SHUTDOWN_TIMEOUT_MS + /** * Various utility methods used by Spark. @@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager { val hookTask = new Runnable() { override def run(): Unit = runAll() } -org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( - hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30) +val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30 +// The timeout property must be passed as a Java system property because this +// is initialized before Spark configurations are registered as system +// properties later in initialization. +val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS) + +timeout.fold { + org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( +hookTask, priority) +} { t => + org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook( +hookTask, priority, t, TimeUnit.MILLISECONDS) +} } def runAll(): Unit = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8bd42cbdb6bf [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561 8bd42cbdb6bf is described below commit 8bd42cbdb6bfa40aead94570b06e926f8e8aa9e1 Author: Kent Yao AuthorDate: Mon Mar 18 08:56:55 2024 -0700 [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561 ### What changes were proposed in this pull request? SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, which caused unsigned TINYINT overflow. As regardless of signed or unsigned types, the TINYINT is used for java.sql.Types. In this PR, we put the signed info into the metadata for mapping TINYINT to short or byte. ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? Uses can read MySQL UNSIGNED TINYINT values after this PR like versions before 3.5.0 which has breaked since 3.5.1 ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45556 from yaooqinn/SPARK-47435. Authored-by: Kent Yao Signed-off-by: Dongjoon Hyun --- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 9 ++-- .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala| 9 ++-- .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 6 ++- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 15 -- .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala | 9 ++-- .../sql/jdbc/v2/PostgresIntegrationSuite.scala | 9 ++-- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 26 ++ .../sql/execution/datasources/jdbc/JdbcUtils.scala | 5 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 10 ++-- .../v2/jdbc/JDBCTableCatalogSuite.scala| 60 -- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 24 + 11 files changed, 114 insertions(+), 68 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala index b1d239337aa0..79e88f109534 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala @@ -57,10 +57,11 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits BIT(10), " + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci DECIMAL(40,20), flt FLOAT, " - + "dbl DOUBLE, tiny TINYINT)").executeUpdate() + + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate() + conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', " + "17, 7, 123456789, 123456789012345, 123456789012345.123456789012345, " - + "42.75, 1.0002, -128)").executeUpdate() + + "42.75, 1.0002, -128, 255)").executeUpdate() conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts TIMESTAMP, " + "yr YEAR)").executeUpdate() @@ -90,7 +91,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { val rows = df.collect() assert(rows.length == 1) val types = rows(0).toSeq.map(x => x.getClass.toString) -assert(types.length == 10) +assert(types.length == 11) assert(types(0).equals("class java.lang.Boolean")) assert(types(1).equals("class java.lang.Long")) assert(types(2).equals("class java.lang.Integer")) @@ -101,6 +102,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.Double")) assert(types(8).equals("class java.lang.Double")) assert(types(9).equals("class java.lang.Byte")) +assert(types(10).equals("class java.lang.Short")) assert(rows(0).getBoolean(0) == false) assert(rows(0).getLong(1) == 0x225) assert(rows(0).getInt(2) == 17) @@ -112,6 +114,7 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite { assert(rows(0).getDouble(7) == 42.75) assert(rows(0).getDouble(8) == 1.0002) assert(rows(0).getByte(9) == 0x80.toByte) +assert(rows(0).getShort(10) == 0xff.toShort) } test("Date types") { diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala index c3ec7e1925fa..6c1b7fdd1be5 100644 ---
(spark) branch master updated (4dc362dbc6c0 -> 1aafe60b3e76)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 add 1aafe60b3e76 [SPARK-47442][CORE][TEST] Use port 0 to start worker servers in MasterSuite No new revisions were added by this update. Summary of changes: .../test/scala/org/apache/spark/deploy/master/MasterSuiteBase.scala| 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 4dc362dbc6c0 is described below commit 4dc362dbc6c039d955e4dceb87e53dfc76ef2a5c Author: panbingkun AuthorDate: Mon Mar 18 08:25:16 2024 -0700 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0 ### What changes were proposed in this pull request? The pr aims to upgrade jackson from `2.16.1` to `2.17.0`. ### Why are the changes needed? The full release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45562 from panbingkun/SPARK-47438. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++--- pom.xml | 4 ++-- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index d4b7d38aea22..86da61d89149 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -103,15 +103,15 @@ icu4j/72.1//icu4j-72.1.jar ini4j/0.5.4//ini4j-0.5.4.jar istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar ivy/2.5.2//ivy-2.5.2.jar -jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar +jackson-annotations/2.17.0//jackson-annotations-2.17.0.jar jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar -jackson-core/2.16.1//jackson-core-2.16.1.jar -jackson-databind/2.16.1//jackson-databind-2.16.1.jar -jackson-dataformat-cbor/2.16.1//jackson-dataformat-cbor-2.16.1.jar -jackson-dataformat-yaml/2.16.1//jackson-dataformat-yaml-2.16.1.jar -jackson-datatype-jsr310/2.16.1//jackson-datatype-jsr310-2.16.1.jar +jackson-core/2.17.0//jackson-core-2.17.0.jar +jackson-databind/2.17.0//jackson-databind-2.17.0.jar +jackson-dataformat-cbor/2.17.0//jackson-dataformat-cbor-2.17.0.jar +jackson-dataformat-yaml/2.17.0//jackson-dataformat-yaml-2.17.0.jar +jackson-datatype-jsr310/2.17.0//jackson-datatype-jsr310-2.17.0.jar jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar -jackson-module-scala_2.13/2.16.1//jackson-module-scala_2.13-2.16.1.jar +jackson-module-scala_2.13/2.17.0//jackson-module-scala_2.13-2.17.0.jar jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar diff --git a/pom.xml b/pom.xml index 757d911c1229..5cc56a92999d 100644 --- a/pom.xml +++ b/pom.xml @@ -184,8 +184,8 @@ true true 1.9.13 -2.16.1 - 2.16.1 +2.17.0 + 2.17.0 2.3.1 3.0.2 1.1.10.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 57424b92c5b5 [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md 57424b92c5b5 is described below commit 57424b92c5b5e7c3de680a7d8a6b137911f45666 Author: Matt Braymer-Hayes AuthorDate: Mon Mar 18 07:53:11 2024 -0700 [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md ### What changes were proposed in this pull request? Adds the Web UI to the `Other Documents` list on the main page. ### Why are the changes needed? I found it difficult to find the Web UI docs: it's only linked inside the Monitoring docs. Adding it to the main page will make it easier for people to find and use the docs. ### Does this PR introduce _any_ user-facing change? Yes: adds another cross-reference on the main page. ### How was this patch tested? Visually verified that Markdown still rendered properly. ### Was this patch authored or co-authored using generative AI tooling? No Closes #45534 from mattayes/patch-2. Authored-by: Matt Braymer-Hayes Signed-off-by: Dongjoon Hyun --- docs/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 5f3858bec86b..12c53c40c8f7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -138,6 +138,7 @@ options for deployment: * [Configuration](configuration.html): customize Spark via its configuration system * [Monitoring](monitoring.html): track the behavior of your applications +* [Web UI](web-ui.html): view useful information about your applications * [Tuning Guide](tuning.html): best practices to optimize performance and memory use * [Job Scheduling](job-scheduling.html): scheduling resources across and within Spark applications * [Security](security.html): Spark security support @@ -145,7 +146,7 @@ options for deployment: * Integration with other storage systems: * [Cloud Infrastructures](cloud-integration.html) * [OpenStack Swift](storage-openstack-swift.html) -* [Migration Guide](migration-guide.html): Migration guides for Spark components +* [Migration Guide](migration-guide.html): migration guides for Spark components * [Building Spark](building-spark.html): build Spark using the Maven system * [Contributing to Spark](https://spark.apache.org/contributing.html) * [Third Party Projects](https://spark.apache.org/third-party-projects.html): related third party Spark projects - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.4 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 7a899e219f5a [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` 7a899e219f5a is described below commit 7a899e219f5a17ab12aeb8d67738025b7e2b9d9c Author: Huw Campbell AuthorDate: Mon Mar 18 07:38:10 2024 -0700 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` ### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3;> Goes correctly to https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5;> Before it would redirect and we'd get a 404. https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef;> Closes #45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell Signed-off-by: Dongjoon Hyun (cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala index 7cd7db4088ac..ce3e7cde01b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala @@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable( override def row(query: StructuredStreamingRow): Seq[Node] = { val streamingQuery = query.streamingUIData -val statisticsLink = "%s/%s/statistics?id=%s" +val statisticsLink = "%s/%s/statistics/?id=%s" .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, streamingQuery.summary.runId) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch branch-3.5 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new bb7a6138b827 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` bb7a6138b827 is described below commit bb7a6138b827975fc827813ab42a2b9074bf8d5e Author: Huw Campbell AuthorDate: Mon Mar 18 07:38:10 2024 -0700 [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` ### What changes were proposed in this pull request? Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when one is using proxy settings. Change the generated link to be consistent with other links and include a trailing slash ### Why are the changes needed? When using a proxy, an invalid redirect is issued if this is not included ### Does this PR introduce _any_ user-facing change? Only that people will be able to use these links if they are using a proxy ### How was this patch tested? With a proxy installed I went to the location this link would generate and could go to the page, when it redirects with the link as it exists. Edit: Further tested by building a version of our application with this patch applied, the links work now. ### Was this patch authored or co-authored using generative AI tooling? No. Page with working link https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3;> Goes correctly to https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5;> Before it would redirect and we'd get a 404. https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef;> Closes #45527 from HuwCampbell/patch-1. Authored-by: Huw Campbell Signed-off-by: Dongjoon Hyun (cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala index 7cd7db4088ac..ce3e7cde01b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala @@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable( override def row(query: StructuredStreamingRow): Seq[Node] = { val streamingQuery = query.streamingUIData -val statisticsLink = "%s/%s/statistics?id=%s" +val statisticsLink = "%s/%s/statistics/?id=%s" .format(SparkUIUtils.prependBaseUri(request, parent.basePath), parent.prefix, streamingQuery.summary.runId) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (d3f12df6e09e -> 9b466d329c3c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*` add 9b466d329c3c [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*` d3f12df6e09e is described below commit d3f12df6e09ee47dbd7c9e08c9962430a2601941 Author: Ruifeng Zheng AuthorDate: Mon Mar 18 20:24:43 2024 +0900 [SPARK-47437][PYTHON][CONNECT] Correct the error class for `DataFrame.sort*` ### What changes were proposed in this pull request? Correct the error class for `DataFrame.sort*` ### Why are the changes needed? `DataFrame.sort*` support negative indices, which means `sort by desc` ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #45559 from zhengruifeng/correct_index_error. Authored-by: Ruifeng Zheng Signed-off-by: Hyukjin Kwon --- python/pyspark/errors/error_classes.py | 5 + python/pyspark/sql/connect/dataframe.py | 3 ++- python/pyspark/sql/dataframe.py | 4 ++-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/python/pyspark/errors/error_classes.py b/python/pyspark/errors/error_classes.py index 1e21ad3543e9..6b7f19b44918 100644 --- a/python/pyspark/errors/error_classes.py +++ b/python/pyspark/errors/error_classes.py @@ -1162,6 +1162,11 @@ ERROR_CLASSES_JSON = ''' "message": [ "Function `` should take at least columns." ] + }, + "ZERO_INDEX": { +"message": [ + "Index must be non-zero." +] } } ''' diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index bbb1be4c4724..f300774b6e57 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -701,7 +701,8 @@ class DataFrame: _c = self[-c - 1].desc() else: raise PySparkIndexError( -error_class="INDEX_NOT_POSITIVE", message_parameters={"index": str(c)} +error_class="ZERO_INDEX", +message_parameters={}, ) else: _c = c # type: ignore[assignment] diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 5c5c263b8156..d04c35dac5e9 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -3331,7 +3331,6 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): jcols = [] for c in cols: if isinstance(c, int) and not isinstance(c, bool): -# TODO: should introduce dedicated error class # ordinal is 1-based if c > 0: _c = self[c - 1] @@ -3340,7 +3339,8 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): _c = self[-c - 1].desc() else: raise PySparkIndexError( -error_class="INDEX_NOT_POSITIVE", message_parameters={"index": str(c)} +error_class="ZERO_INDEX", +message_parameters={}, ) else: _c = c # type: ignore[assignment] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (08866c280f87 -> 42c4dad62dcd)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 08866c280f87 [SPARK-47439][PYTHON] Document Python Data Source API in API reference page add 42c4dad62dcd [SPARK-43435][PYTHON][CONNECT][TESTS] Reenable doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream` No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/dataframe.py | 3 --- 1 file changed, 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-47439][PYTHON] Document Python Data Source API in API reference page
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 08866c280f87 [SPARK-47439][PYTHON] Document Python Data Source API in API reference page 08866c280f87 is described below commit 08866c280f877ce27d5c5305c7a09add76c86774 Author: Hyukjin Kwon AuthorDate: Mon Mar 18 20:08:22 2024 +0900 [SPARK-47439][PYTHON] Document Python Data Source API in API reference page ### What changes were proposed in this pull request? This PR proposes to document Python Data Source API in Python API reference page. ### Why are the changes needed? For users/developers to know how to use them. ### Does this PR introduce _any_ user-facing change? Yes, it documents Python Data Source API. ### How was this patch tested? Manually checked the output from Python API reference build ```bash cd python/docs make clean html open build/html/index.html ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45561 from HyukjinKwon/SPARK-47439. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../source/reference/pyspark.sql/core_classes.rst | 7 .../{core_classes.rst => datasource.rst} | 44 +++--- python/docs/source/reference/pyspark.sql/index.rst | 1 + .../source/reference/pyspark.sql/spark_session.rst | 1 + 4 files changed, 31 insertions(+), 22 deletions(-) diff --git a/python/docs/source/reference/pyspark.sql/core_classes.rst b/python/docs/source/reference/pyspark.sql/core_classes.rst index 3cf19686cdd8..65096da21de5 100644 --- a/python/docs/source/reference/pyspark.sql/core_classes.rst +++ b/python/docs/source/reference/pyspark.sql/core_classes.rst @@ -42,3 +42,10 @@ Core Classes UDTFRegistration udf.UserDefinedFunction udtf.UserDefinedTableFunction +datasource.DataSource +datasource.DataSourceReader +datasource.DataSourceStreamReader +datasource.DataSourceWriter +datasource.DataSourceRegistration +datasource.InputPartition +datasource.WriterCommitMessage diff --git a/python/docs/source/reference/pyspark.sql/core_classes.rst b/python/docs/source/reference/pyspark.sql/datasource.rst similarity index 58% copy from python/docs/source/reference/pyspark.sql/core_classes.rst copy to python/docs/source/reference/pyspark.sql/datasource.rst index 3cf19686cdd8..b92db7a28858 100644 --- a/python/docs/source/reference/pyspark.sql/core_classes.rst +++ b/python/docs/source/reference/pyspark.sql/datasource.rst @@ -16,29 +16,29 @@ under the License. - -Core Classes - -.. currentmodule:: pyspark.sql +== +Python Data Source +== + +.. currentmodule:: pyspark.sql.datasource .. autosummary:: :toctree: api/ -SparkSession -Catalog -DataFrame -Column -Observation -Row -GroupedData -PandasCogroupedOps -DataFrameNaFunctions -DataFrameStatFunctions -Window -DataFrameReader -DataFrameWriter -DataFrameWriterV2 -UDFRegistration -UDTFRegistration -udf.UserDefinedFunction -udtf.UserDefinedTableFunction +DataSource.name +DataSource.reader +DataSource.schema +DataSource.streamReader +DataSource.writer +DataSourceReader.partitions +DataSourceReader.read +DataSourceRegistration.register +DataSourceStreamReader.commit +DataSourceStreamReader.initialOffset +DataSourceStreamReader.latestOffset +DataSourceStreamReader.partitions +DataSourceStreamReader.read +DataSourceStreamReader.stop +DataSourceWriter.abort +DataSourceWriter.commit +DataSourceWriter.write diff --git a/python/docs/source/reference/pyspark.sql/index.rst b/python/docs/source/reference/pyspark.sql/index.rst index 233c8b238a6d..9322a91fba25 100644 --- a/python/docs/source/reference/pyspark.sql/index.rst +++ b/python/docs/source/reference/pyspark.sql/index.rst @@ -42,3 +42,4 @@ This page gives an overview of all public Spark SQL API. udf udtf protobuf +datasource diff --git a/python/docs/source/reference/pyspark.sql/spark_session.rst b/python/docs/source/reference/pyspark.sql/spark_session.rst index 4be343c52140..ea71249e292e 100644 --- a/python/docs/source/reference/pyspark.sql/spark_session.rst +++ b/python/docs/source/reference/pyspark.sql/spark_session.rst @@ -47,6 +47,7 @@ See also :class:`SparkSession`. SparkSession.catalog SparkSession.conf SparkSession.createDataFrame +SparkSession.dataSource SparkSession.getActiveSession SparkSession.newSession SparkSession.profile - To unsubscribe, e-mail:
(spark) branch master updated: [MINOR][TESTS] Collation - extending golden file coverage
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 44a88edc995e [MINOR][TESTS] Collation - extending golden file coverage 44a88edc995e is described below commit 44a88edc995e1e09adfab80e63a409f8ced3b131 Author: Aleksandar Tomic AuthorDate: Mon Mar 18 13:52:48 2024 +0500 [MINOR][TESTS] Collation - extending golden file coverage ### What changes were proposed in this pull request? This PR adds new golden file tests for collation feature: 1) DESCRIBE 3) Basic array operations 4) Removing struct test since same is already covered in golden files. ### Why are the changes needed? Extending test coverage for collation feature. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No Closes #45515 from dbatomic/collation_golden_files_update. Authored-by: Aleksandar Tomic Signed-off-by: Max Gekk --- .../sql-tests/analyzer-results/collations.sql.out | 38 +- .../test/resources/sql-tests/inputs/collations.sql | 15 +++- .../resources/sql-tests/results/collations.sql.out | 45 +- .../org/apache/spark/sql/CollationSuite.scala | 22 --- 4 files changed, 92 insertions(+), 28 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out index 6d9bb3470be6..3a0f8eec02ba 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/collations.sql.out @@ -37,6 +37,12 @@ InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_d +- LocalRelation [col1#x, col2#x] +-- !query +describe table t1 +-- !query analysis +DescribeTableCommand `spark_catalog`.`default`.`t1`, false, [col_name#x, data_type#x, comment#x] + + -- !query select count(*) from t1 group by utf8_binary -- !query analysis @@ -207,7 +213,7 @@ CreateDataSourceTableCommand `spark_catalog`.`default`.`t1`, false -- !query -INSERT INTO t1 VALUES (named_struct('utf8_binary', 'aaa', 'utf8_binary_lcase', 'aaa')) +insert into t1 values (named_struct('utf8_binary', 'aaa', 'utf8_binary_lcase', 'aaa')) -- !query analysis InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t1, false, Parquet, [path=file:[not included in comparison]/{warehouse_dir}/t1], Append, `spark_catalog`.`default`.`t1`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t1), [c1] +- Project [named_struct(utf8_binary, col1#x.utf8_binary, utf8_binary_lcase, cast(col1#x.utf8_binary_lcase as string collate UTF8_BINARY_LCASE)) AS c1#x] @@ -215,7 +221,7 @@ InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_d -- !query -INSERT INTO t1 VALUES (named_struct('utf8_binary', 'AAA', 'utf8_binary_lcase', 'AAA')) +insert into t1 values (named_struct('utf8_binary', 'AAA', 'utf8_binary_lcase', 'AAA')) -- !query analysis InsertIntoHadoopFsRelationCommand file:[not included in comparison]/{warehouse_dir}/t1, false, Parquet, [path=file:[not included in comparison]/{warehouse_dir}/t1], Append, `spark_catalog`.`default`.`t1`, org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:[not included in comparison]/{warehouse_dir}/t1), [c1] +- Project [named_struct(utf8_binary, col1#x.utf8_binary, utf8_binary_lcase, cast(col1#x.utf8_binary_lcase as string collate UTF8_BINARY_LCASE)) AS c1#x] @@ -243,3 +249,31 @@ drop table t1 -- !query analysis DropTable false, false +- ResolvedIdentifier V2SessionCatalog(spark_catalog), default.t1 + + +-- !query +select array_contains(ARRAY('aaa' collate utf8_binary_lcase),'AAA' collate utf8_binary_lcase) +-- !query analysis +Project [array_contains(array(collate(aaa, utf8_binary_lcase)), collate(AAA, utf8_binary_lcase)) AS array_contains(array(collate(aaa)), collate(AAA))#x] ++- OneRowRelation + + +-- !query +select array_position(ARRAY('aaa' collate utf8_binary_lcase, 'bbb' collate utf8_binary_lcase),'BBB' collate utf8_binary_lcase) +-- !query analysis +Project [array_position(array(collate(aaa, utf8_binary_lcase), collate(bbb, utf8_binary_lcase)), collate(BBB, utf8_binary_lcase)) AS array_position(array(collate(aaa), collate(bbb)), collate(BBB))#xL] ++- OneRowRelation + + +-- !query +select nullif('aaa' COLLATE utf8_binary_lcase, 'AAA' COLLATE utf8_binary_lcase) +-- !query analysis +Project [nullif(collate(aaa, utf8_binary_lcase), collate(AAA, utf8_binary_lcase)) AS nullif(collate(aaa), collate(AAA))#x] ++- OneRowRelation + + +--
(spark) branch master updated (310dd5294267 -> 3f171ce3f43b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 310dd5294267 [SPARK-45891][DOCS] Update README with more details add 3f171ce3f43b [SPARK-47436][PYTHON][DOCS] Fix docstring links and type hints in Python Data Source No new revisions were added by this update. Summary of changes: python/pyspark/sql/datasource.py | 98 1 file changed, 48 insertions(+), 50 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org