[
https://issues.apache.org/jira/browse/IMPALA-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Maslov updated IMPALA-14993:
------------------------------------
Description:
On Iceberg V2 tables that contain delete files, queries without {{count(*)}} in
the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer rows
than they should.
h3. Steps to reproduce
{{CREATE TABLE ice1 (id INT, c1 INT)}}
{{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
{{INSERT INTO ice1 SELECT 1, 10;}}
{{INSERT INTO ice1 SELECT 2, 20;}}
{{DELETE FROM ice1 WHERE id = 1;}}
{{SELECT 1 FROM ice1; – expected: 1 row, actual: 0 rows}}
h3. Root cause
{{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the
optimization based on a loop that _rejects_ anything that is not {{count(*)}}
or a constant - but never checks that at least one {{count(*)}} is actually
present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls
through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
h3. Proposed fix
Implement the protection in method V2 in a similar way to method V1, by adding
the hasCountStarFunc flag in file
fe/src/main/java/org/apache/impala/analysis/SelectStmt.java -
optimizePlainCountStarQueryV2() :
{{boolean hasCountStarFunc = false;}}
{{boolean alreadyOptimized = false;}}
{{for (SelectListItem selectItem : getSelectList().getItems()) {}}
{{ Expr expr = selectItem.getExpr();}}
{{ if (expr == null) return;}}
{{ if (expr.isConstant()) continue;}}
{{ if (expr instanceof IcebergV2CountStarAccumulator) {}}
{{ alreadyOptimized = true;}}
{{ continue;}}
{{ if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
{{ hasCountStarFunc = true;}}
{{}}}
{{if (!hasCountStarFunc && !alreadyOptimized) return;}}
was:
On Iceberg V2 tables that contain delete files, queries without {{count(*)}} in
the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer rows
than they should.
h3. Steps to reproduce
{{CREATE TABLE ice1 (id INT, c1 INT)}}
{{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
{{INSERT INTO ice1 SELECT 1, 10;}}
{{INSERT INTO ice1 SELECT 2, 20;}}
{{DELETE FROM ice1 WHERE id = 1;}}
{{SELECT 1 FROM ice1; -- expected: 1 row, actual: 0 rows}}
h3. Root cause
{{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the
optimization based on a loop that _rejects_ anything that is not {{count(*)}}
or a constant - but never checks that at least one {{count(*)}} is actually
present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls
through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
h3. Proposed fix
Implement the protection in method V2 in a similar way to method V1, by adding
the hasCountStarFunc flag in file
fe/src/main/java/org/apache/impala/analysis/SelectStmt.java -
optimizePlainCountStarQueryV2() :
{{boolean hasCountStarFunc = false;}}
{{boolean alreadyOptimized = false;}}
{{for (SelectListItem selectItem : getSelectList().getItems()) {}}
{{ Expr expr = selectItem.getExpr();}}
{{ if (expr == null) return;}}
{{ if (expr.isConstant()) continue;}}
{{ if (expr instanceof IcebergV2CountStarAccumulator) {}}
{{ alreadyOptimized = true;}}
{{ continue;}}
{{ }}}
{{ if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
{{ hasCountStarFunc = true;}}
{{}}}
{{if (!hasCountStarFunc && !alreadyOptimized) return;}}
> Iceberg V2 count(*) optimization is incorrectly applied to queries without
> count(*), causing row loss
> -----------------------------------------------------------------------------------------------------
>
> Key: IMPALA-14993
> URL: https://issues.apache.org/jira/browse/IMPALA-14993
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.5.0
> Reporter: Dmitriy Maslov
> Priority: Major
> Labels: iceberg
>
> On Iceberg V2 tables that contain delete files, queries without {{count(*)}}
> in the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer
> rows than they should.
> h3. Steps to reproduce
> {{CREATE TABLE ice1 (id INT, c1 INT)}}
> {{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
> {{INSERT INTO ice1 SELECT 1, 10;}}
> {{INSERT INTO ice1 SELECT 2, 20;}}
> {{DELETE FROM ice1 WHERE id = 1;}}
> {{SELECT 1 FROM ice1; – expected: 1 row, actual: 0 rows}}
> h3. Root cause
> {{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the
> optimization based on a loop that _rejects_ anything that is not {{count(*)}}
> or a constant - but never checks that at least one {{count(*)}} is actually
> present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls
> through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
> h3. Proposed fix
> Implement the protection in method V2 in a similar way to method V1, by
> adding the hasCountStarFunc flag in file
> fe/src/main/java/org/apache/impala/analysis/SelectStmt.java -
> optimizePlainCountStarQueryV2() :
> {{boolean hasCountStarFunc = false;}}
> {{boolean alreadyOptimized = false;}}
> {{for (SelectListItem selectItem : getSelectList().getItems()) {}}
> {{ Expr expr = selectItem.getExpr();}}
> {{ if (expr == null) return;}}
> {{ if (expr.isConstant()) continue;}}
> {{ if (expr instanceof IcebergV2CountStarAccumulator) {}}
> {{ alreadyOptimized = true;}}
> {{ continue;}}
> {{ if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
> {{ hasCountStarFunc = true;}}
> {{}}}
> {{if (!hasCountStarFunc && !alreadyOptimized) return;}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]