Xiao Li created SPARK-12975:
-------------------------------
Summary: Eliminate Bucketing Columns that are part of Partitioning
Columns
Key: SPARK-12975
URL: https://issues.apache.org/jira/browse/SPARK-12975
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li
When users are using partitionBy and bucketBy at the same time, some bucketing
columns might be part of partitioning columns. For example,
{code}
df.write
.format(source)
.partitionBy("i")
.bucketBy(8, "i", "k")
.sortBy("k")
.saveAsTable("bucketed_table")
{code}
However, in the above case, adding column `i` is useless. It is just wasting
extra CPU when reading or writing bucket tables. Thus, we can automatically
remove these overlapping columns from bucketing columns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]