[
https://issues.apache.org/jira/browse/SPARK-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-22639:
---------------------------------
Labels: bulk-closed (was: )
> no rowcount estimation returned if groupby clause involves substring
> --------------------------------------------------------------------
>
> Key: SPARK-22639
> URL: https://issues.apache.org/jira/browse/SPARK-22639
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, SQL
> Affects Versions: 2.2.0
> Reporter: ey-chih chow
> Priority: Major
> Labels: bulk-closed
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> CBO can not estimate rowcount if the groupby clause of a query involves the
> expression substring. For example, we can not estimate the row count of the
> following query, extracted from TPC-DS queries and based on the TPC-DS schema:
> SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`,
> sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk`
> FROM store_sales INNER JOIN date_dim ON (date_dim.`d_date_sk` =
> store_sales.`ss_sold_date_sk`) INNER JOIN item ON (store_sales.`ss_item_sk`
> = item.`i_item_sk`)
> GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1,
> 30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk`
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]