Liu Dinghua created SPARK-32632:
-----------------------------------
Summary: Bad partitioning in spark jdbc method with parameter
lowerBound and upperBound
Key: SPARK-32632
URL: https://issues.apache.org/jira/browse/SPARK-32632
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.0
Reporter: Liu Dinghua
When i use the jdbc methed
{code:java}
def jdbc( url: String, table: String, columnName: String, lowerBound: Long,
upperBound: Long, numPartitions: Int, connectionProperties: Properties)
{code}
I am confused by the partitions generated by this method for the rows of
first partition is not limited by the lowerBound and the ones of the last
partition isn't limited by the upperBound.
For example, I use the method as follow:
{code:java}
val data = spark.read.jdbc(url, table, "id", 2, 5, 3,buildProperties())
.selectExpr("id","appkey","funnel_name")
data.show(100, false)
{code}
The result partitions info is :
20/08/05 16:58:59 INFO JDBCRelation: Number of partitions: 3, WHERE clauses of
these partitions: `id` < 3 or `id` is null, `id` >= 3 AND `id` < 4, `id` >= 4
The returned data is:
||id|| appkey||funnel_name||
|0|yanshi|test001|
|1|yanshi|test002|
|2|yanshi|test003|
|3|xingkong|test_funnel|
|4|xingkong|test_funnel2|
|5|xingkong|test_funnel3|
|6|donews|test_funnel4|
|7|donews|test_funnel|
|8|donews|test_funnel2|
|9|dami|test_funnel3|
|13|dami|test_funnel4|
|15|xiaoai|test_funnel6|
Normally, the clause of the first partition is " 'id' >=2 and `id` < 3 " for
the lowerBound is 2, and the clause of the last partition is " `id` >= 4", but
the facts are not.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]