[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233967#comment-17233967
 ] 

ASF subversion and git services commented on IMPALA-7876:
---------------------------------------------------------

Commit fa525dfdf72f6f612821a14e683cd7f16d2c423a in impala's branch 
refs/heads/master from Abhishek Rawat
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fa525df ]

IMPALA-7876: COMPUTE STATS TABLESAMPLE is not updating number of estimated rows

'COMPUTE STATS TABLESAMPLE' uses a child query with following function
'ROUND(COUNT(*) / <effective_sample_perc>)' for computing the row count.
The 'ROUND()' fn returns the row count as a DECIMAL type. The
'CatalogOpExecutor' (CatalogOpExecutor::SetTableStats) expects the row
count as a BIGINT type. Due to this data type mismatch the table stats
(Extrap #Rows) doesn't get set.

Adding an explicit CAST to BIGINT for the ROUND function results in the
table stats (Extrap #Rows) getting set properly.

Fixed both 'custom_cluster/test_stats_extrapolation.py' and
'metadata/test_stats_extrapolation.py' so that they can catch issues
like this, where table stats are not set when using
'COMPUTE STATS TABLESAMPLE'.

Testing:
- Ran core tests.

Change-Id: I88a0a777c2be9cc18b3ff293cf1c06fb499ca052
Reviewed-on: http://gerrit.cloudera.org:8080/16712
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
>                 Key: IMPALA-7876
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7876
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Andre Araujo
>            Assignee: Abhishek Rawat
>            Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary                                   |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size    | Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location                            |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0     | -1           | 84     | 20.35GB | NOT CACHED   | NOT CACHED        
> | PARQUET | false             | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to