[
https://issues.apache.org/jira/browse/SPARK-40485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-40485:
------------------------------------
Assignee: Apache Spark
> Extend the partitioning options of the JDBC data source
> -------------------------------------------------------
>
> Key: SPARK-40485
> URL: https://issues.apache.org/jira/browse/SPARK-40485
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Luca Canali
> Assignee: Apache Spark
> Priority: Minor
>
> This proposes to extend the available partitioning options for the JDBC data
> source.
> Partitioning options allow to read data using multiple workers connected to
> the target RDBMS. This can improve the performance of data extraction, under
> the right circumstances.
> Currently the only available partitioning and parallelization option for
> reading from databases is to specify lowerBound, upperBound, together with
> numPartitions and partitionColumn. The Spark JDBC data source will then use
> multiple partitions, and thus workers, to read from the RDBMS.
> This proposes to add a similar, however complementary, mechanism for
> partitioning, where a user-provided list of values is used to compute the
> target partitions.
> This provides a way to split the data extraction work among workers that
> could be aligned with the database physical (partitioned and/or indexed)
> structure, as in the following example:
> {code:java}
> option("partitionColumn", "region").
> option("numPartitions", 3).
> option("partitionColValues", "'eastern', 'central', 'western'"). {code}
> This feature is motivated for performance reasons, to scale and speed up data
> extraction from:
> - list partitioned tables, available in Oracle and PostgreSQL
> - this is also applicable to tables stored in B*Tree indexes, such as in
> Oracle's IOTs (Index Organized Tables) and SQL Server's Clustered Indexes.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]