[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184701#comment-17184701
 ] 

Vincent Tran commented on IMPALA-7876:
--------------------------------------

 
{noformat}
===Without sampling===
[:21000] default> compute stats one_gram_p1;
Query: compute stats one_gram_p1
+-----------------------------------------+
| summary |
+-----------------------------------------+
| Updated 1 partition(s) and 3 column(s). |
+-----------------------------------------+
Fetched 1 row(s) in 1.51s
[:21000] default> show table stats one_gram_p1;
Query: show table stats one_gram_p1
+-------+----------+--------------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------------------------------+
| year | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache 
Replication | Format | Incremental stats | Location |
+-------+----------+--------------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------------------------------+
| 2000 | -1 | 19013482 | 3 | 289.07MB | NOT CACHED | NOT CACHED | TEXT | false 
| hdfs://:8020/user/hive/warehouse/one_gram_p1/year=2000 |
| Total | 19013482 | 19013482 | 3 | 289.07MB | 0B | | | | |
+-------+----------+--------------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------------------------------+
Fetched 2 row(s) in 0.01s

===With sampling===
[:21000] default> set compute_stats_min_sample_size=1MB;
COMPUTE_STATS_MIN_SAMPLE_SIZE set to 1MB
:21000] default> compute stats one_gram_p1 tablesample system(10);
Query: compute stats one_gram_p1 tablesample system(10)
+-----------------------------------------+
| summary |
+-----------------------------------------+
| Updated 1 partition(s) and 3 column(s). |
+-----------------------------------------+
Fetched 1 row(s) in 1.72s
[:21000] default> show table stats one_gram_p1;
Query: show table stats one_gram_p1
+-------+-------+--------------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------------------------------+
| year | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache 
Replication | Format | Incremental stats | Location |
+-------+-------+--------------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------------------------------+
| 2000 | -1 | -1 | 3 | 289.07MB | NOT CACHED | NOT CACHED | TEXT | false | 
hdfs://:8020/user/hive/warehouse/one_gram_p1/year=2000 |
| Total | 0 | -1 | 3 | 289.07MB | 0B | | | | |
+-------+-------+--------------+--------+----------+--------------+-------------------+--------+-------------------+-------------------------------------------------------------------------------------------+
Fetched 2 row(s) in 0.01s
{noformat}
 

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
>                 Key: IMPALA-7876
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7876
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Andre Araujo
>            Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary                                   |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size    | Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location                            |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0     | -1           | 84     | 20.35GB | NOT CACHED   | NOT CACHED        
> | PARQUET | false             | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to