Dmitriy Maslov created IMPALA-14993:
---------------------------------------

             Summary: Iceberg V2 count(*) optimization is incorrectly applied 
to queries without count(*), causing row loss
                 Key: IMPALA-14993
                 URL: https://issues.apache.org/jira/browse/IMPALA-14993
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.5.0
            Reporter: Dmitriy Maslov


On Iceberg V2 tables that contain delete files, queries without {{count(*)}} in 
the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer rows 
than they should.
h3. Steps to reproduce

{{CREATE TABLE ice1 (id INT, c1 INT)}}
{{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
{{INSERT INTO ice1 SELECT 1, 10;}}
{{INSERT INTO ice1 SELECT 2, 20;}}

{{DELETE FROM ice1 WHERE id = 1;}}
{{SELECT 1 FROM ice1;  -- expected: 1 row, actual: 0 rows}}
h3. Root cause

{{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the 
optimization based on a loop that _rejects_ anything that is not {{count(*)}} 
or a constant - but never checks that at least one {{count(*)}} is actually 
present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls 
through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
h3. Proposed fix

Implement the protection in method V2 in a similar way to method V1, by adding 
the hasCountStarFunc flag in file 
fe/src/main/java/org/apache/impala/analysis/SelectStmt.java - 
optimizePlainCountStarQueryV2() :

{{boolean hasCountStarFunc = false;}}
{{boolean alreadyOptimized = false;}}
{{for (SelectListItem selectItem : getSelectList().getItems()) {}}
{{  Expr expr = selectItem.getExpr();}}
{{  if (expr == null) return;}}
{{  if (expr.isConstant()) continue;}}
{{  if (expr instanceof IcebergV2CountStarAccumulator) {}}
{{    alreadyOptimized = true;}}
{{    continue;}}
{{  }}}
{{  if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
{{  hasCountStarFunc = true;}}
{{}}}
{{if (!hasCountStarFunc && !alreadyOptimized) return;}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to