[ 
https://issues.apache.org/jira/browse/SPARK-25774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25774:
------------------------------------

    Assignee:     (was: Apache Spark)

> Eliminate query anomalies with empty partitions - TRUNCATE, SELECT DISTINCT, 
> etc.
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-25774
>                 URL: https://issues.apache.org/jira/browse/SPARK-25774
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>         Environment: Right now, I'm using Cloudera with Spark 2.2.0, but I 
> understand it's a widespread thing.
>            Reporter: Steven Cardella
>            Priority: Major
>
> If you run a spark SQL TRUNCATE TABLE command on a managed table in Hive, it 
> deletes the files in HDFS but leaves the partitions and partition folder 
> structure.  If you then SELECT DISTINCT on the partition columns, it returns 
> all the empty partition values.  So, you can have a SELECT DISTINCT return 
> rows but SELECT * on the same table returns 0 rows.  
> Coming from SQL Server and the like, SELECT DISTINCT always reflects the 
> ROWS, and Impala works like that as well.  
> I'd like SELECT DISTINCT to reflect rows, not partitions, TRUNCATE TABLE to 
> have the option to drop partitions, and MSCK REPAIR TABLE to have the option 
> to drop empty partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to