[
https://issues.apache.org/jira/browse/SPARK-25774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-25774:
------------------------------------
Assignee: (was: Apache Spark)
> Eliminate query anomalies with empty partitions - TRUNCATE, SELECT DISTINCT,
> etc.
> ---------------------------------------------------------------------------------
>
> Key: SPARK-25774
> URL: https://issues.apache.org/jira/browse/SPARK-25774
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.2.0
> Environment: Right now, I'm using Cloudera with Spark 2.2.0, but I
> understand it's a widespread thing.
> Reporter: Steven Cardella
> Priority: Major
>
> If you run a spark SQL TRUNCATE TABLE command on a managed table in Hive, it
> deletes the files in HDFS but leaves the partitions and partition folder
> structure. If you then SELECT DISTINCT on the partition columns, it returns
> all the empty partition values. So, you can have a SELECT DISTINCT return
> rows but SELECT * on the same table returns 0 rows.
> Coming from SQL Server and the like, SELECT DISTINCT always reflects the
> ROWS, and Impala works like that as well.
> I'd like SELECT DISTINCT to reflect rows, not partitions, TRUNCATE TABLE to
> have the option to drop partitions, and MSCK REPAIR TABLE to have the option
> to drop empty partitions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]