[ https://issues.apache.org/jira/browse/SPARK-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shahid updated SPARK-22639: --------------------------- Affects Version/s: 3.1.1 > no rowcount estimation returned if groupby clause involves substring > -------------------------------------------------------------------- > > Key: SPARK-22639 > URL: https://issues.apache.org/jira/browse/SPARK-22639 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 2.2.0, 3.1.1 > Reporter: ey-chih chow > Priority: Major > Labels: bulk-closed > Original Estimate: 504h > Remaining Estimate: 504h > > CBO can not estimate rowcount if the groupby clause of a query involves the > expression substring. For example, we can not estimate the row count of the > following query, extracted from TPC-DS queries and based on the TPC-DS schema: > SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`, > sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk` > FROM store_sales INNER JOIN date_dim ON (date_dim.`d_date_sk` = > store_sales.`ss_sold_date_sk`) INNER JOIN item ON (store_sales.`ss_item_sk` > = item.`i_item_sk`) > GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1, > 30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk` > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org