[
https://issues.apache.org/jira/browse/SPARK-32632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Liu Dinghua reopened SPARK-32632:
---------------------------------
[~maropu] , thanks for your answer! I had seen these describtions, the
question I want to ask is why it is designed like this. what reasons were
consitered to do so? for this could lead to the data skew of the first
partition and the last partition. Look forward to your reply, Thank you very
much again.
> Bad partitioning in spark jdbc method with parameter lowerBound and upperBound
> ------------------------------------------------------------------------------
>
> Key: SPARK-32632
> URL: https://issues.apache.org/jira/browse/SPARK-32632
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Liu Dinghua
> Priority: Major
>
> When I use the jdbc methed
> {code:java}
> def jdbc( url: String, table: String, columnName: String, lowerBound: Long,
> upperBound: Long, numPartitions: Int, connectionProperties: Properties)
> {code}
>
> I am confused by the partitions generated by this method, for rows of the
> first partition aren't limited by the lowerBound and the ones of the last
> partition are not limited by the upperBound.
>
> For example, I use the method as follow:
>
> {code:java}
> val data = spark.read.jdbc(url, table, "id", 2, 5, 3,buildProperties())
> .selectExpr("id","appkey","funnel_name")
> data.show(100, false)
> {code}
>
> The result partitions info is :
> 20/08/05 16:58:59 INFO JDBCRelation: Number of partitions: 3, WHERE clauses
> of these partitions: `id` < 3 or `id` is null, `id` >= 3 AND `id` < 4, `id`
> >= 4
> The returned data is:
> ||id|| appkey||funnel_name||
> |0|yanshi|test001|
> |1|yanshi|test002|
> |2|yanshi|test003|
> |3|xingkong|test_funnel|
> |4|xingkong|test_funnel2|
> |5|xingkong|test_funnel3|
> |6|donews|test_funnel4|
> |7|donews|test_funnel|
> |8|donews|test_funnel2|
> |9|dami|test_funnel3|
> |13|dami|test_funnel4|
> |15|xiaoai|test_funnel6|
>
> Normally, the clause of the first partition should be " `id` >=2 and `id` < 3
> " because the lowerBound is 2, and the clause of the last partition should
> be " `id` >= 4 and `id` < 5 ", but the facts are not.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]