[ 
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185433#comment-17185433
 ] 

Tim Armstrong commented on IMPALA-7876:
---------------------------------------

I can reproduce this on master with default configs if I do a second insert. 
I.e. 
{noformat}
CREATE TABLE default.one_gram_p ( ngram STRING, match_count INT,volume_count 
INT )PARTITIONED BY (year STRING)STORED AS TEXTFILE TBLPROPERTIES 
('impala.enable.stats.extrapolation'='true');
insert into one_gram_p partition(year) values('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010');
insert into one_gram_p partition(year) values('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010'), ('abc',122,2564,'2010'), 
('abc',122,2564,'2010'), ('abc',122,2564,'2010');
set compute_stats_min_sample_size=1B;
compute stats one_gram_p tablesample system(50);
show table stats one_gram_p;

+-------+-------+--------------+--------+------+--------------+-------------------+--------+--------
-----------+-------------------------------------------------------------+
| year  | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache 
Replication | Format | Increme
ntal stats | Location                                                    |
+-------+-------+--------------+--------+------+--------------+-------------------+--------+--------
-----------+-------------------------------------------------------------+
| 2010  | -1    | -1           | 2      | 780B | NOT CACHED   | NOT CACHED      
  | TEXT   | false  
           | hdfs://172.19.0.1:20500/test-warehouse/one_gram_p/year=2010 |
| Total | 0     | -1           | 2      | 780B | 0B           |                 
  |        |        
           |                                                             |
+-------+-------+--------------+--------+------+--------------+-------------------+--------+--------
-----------+-------------------------------------------------------------+

{noformat}

> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
>                 Key: IMPALA-7876
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7876
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Andre Araujo
>            Assignee: Tim Armstrong
>            Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary                                   |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table 
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size    | Bytes Cached | Cache Replication 
> | Format  | Incremental stats | Location                            |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0     | -1           | 84     | 20.35GB | NOT CACHED   | NOT CACHED        
> | PARQUET | false             | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to