[
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184697#comment-17184697
]
Vincent Tran commented on IMPALA-7876:
--------------------------------------
I can reproduce this on ~ 3.2.0. I think this may be unrelated to the width of
the table.
The specs for my table is below:
{noformat}
default> show create table one_gram_p;
Query: show create table one_gram_p
CREATE TABLE default.one_gram_p (
ngram STRING,
match_count INT,
volume_count INT
)
PARTITIONED BY (
year STRING
)
STORED AS TEXTFILE
LOCATION 'hdfs:////user/hive/warehouse/one_gram_p'
TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'STATS_GENERATED'='TASK',
'impala.enable.stats.extrapolation'='true',
'impala.lastComputeStatsTime'='1598383227', 'numRows'='1430731493',
'totalSize'='22081529047')
{noformat}
I need to check against the master branch next.
> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Andre Araujo
> Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache Replication
> | Format | Incremental stats | Location |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]