[
https://issues.apache.org/jira/browse/IMPALA-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224873#comment-17224873
]
Abhishek Rawat edited comment on IMPALA-7876 at 11/2/20, 6:29 PM:
------------------------------------------------------------------
The core issue here is that the child query computing the num_rows (table
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg.
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10)
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be
*BIGINT*.
[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]
[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]
This used to work in the past because ROUND used to return results as type
BIGINT.
This behavior was later changed for the better in this
[commit|http://https//github.com/apache/impala/commit/8fec1911e52e40aff4cc1de17265bd6803cb13f5]
There are couple of ways to fix this issue. I am leaning towards a fix which
will add a *CAST as BIGINT* in the generated SQL for the child query, since
num_rows should be a BIGINT.
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]
Also, probably best to fix this in the child query's sql, rather than adding
implicit casts else where in the code.
was (Author: arawat):
The core issue here is that the child query computing the num_rows (table
stats) uses ROUND function which returns the results as a *DECIMAL* type. Eg.
below.
{code:java}
SELECT ROUND(COUNT / 0.8935390115) FROM t1 TABLESAMPLE SYSTEM(10)
REPEATABLE(1598511315168){code}
The CatalogOpExecutor when setting the table stats expects the data type to be
*BIGINT*.
[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L243]
[https://github.com/apache/impala/blob/master/be/src/exec/catalog-op-executor.cc#L255]
This used to work in the past because ROUND used to return results as type
BIGINT.
This behavior was later changed for the better in this
[commit|http://mpala-6230%2C%20impala-6468:%20Fix%20the%20output%20type%20of%20round()%20and%20related%20fns/].
There are couple of ways to fix this issue. I am leaning towards a fix which
will add a *CAST as BIGINT* in the generated SQL for the child query, since
num_rows should be a BIGINT.
[https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java#L548]
Also, probably best to fix this in the child query's sql, rather than adding
implicit casts else where in the code.
> COMPUTE STATS TABLESAMPLE is not updating number of estimated rows
> ------------------------------------------------------------------
>
> Key: IMPALA-7876
> URL: https://issues.apache.org/jira/browse/IMPALA-7876
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.0
> Reporter: Andre Araujo
> Assignee: Abhishek Rawat
> Priority: Critical
>
> Running the command below seems to have no impact on the #rows stats.
> {code}
> [host:21000] default> COMPUTE STATS wide TABLESAMPLE SYSTEM(5);
> Query: COMPUTE STATS wide TABLESAMPLE SYSTEM(100)
> +-------------------------------------------+
> | summary |
> +-------------------------------------------+
> | Updated 1 partition(s) and 103 column(s). |
> +-------------------------------------------+
> WARNINGS: Ignoring TABLESAMPLE because the effective sampling rate is 100%.
> The minimum sample size is COMPUTE_STATS_MIN_SAMPLE_SIZE=1.00GB and the table
> size 20.35GB
> Fetched 1 row(s) in 43.67s
> [host:21000] default> show table stats wide;
> Query: show table stats wide
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | #Rows | Extrap #Rows | #Files | Size | Bytes Cached | Cache Replication
> | Format | Incremental stats | Location |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> | 0 | -1 | 84 | 20.35GB | NOT CACHED | NOT CACHED
> | PARQUET | false | hdfs://ns1/user/hive/warehouse/wide |
> +-------+--------------+--------+---------+--------------+-------------------+---------+-------------------+-------------------------------------+
> Fetched 1 row(s) in 0.01s
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]