peng bo created SPARK-27539: ------------------------------- Summary: Inaccurate aggregate outputRows estimation with null value column Key: SPARK-27539 URL: https://issues.apache.org/jira/browse/SPARK-27539 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: peng bo
This issue is follow up of [https://github.com/apache/spark/pull/24286]. As [~smilegator] pointed out that column with null value is inaccurate as well. {code:java} > select * from test; 2 NULL 1 spark-sql> desc extended test key; col_name key data_type int comment NULL min 1 max 2 num_nulls 1 distinct_count 2{code} The distinct count should be distinct_count + 1 when the column contains null value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org