[ 
https://issues.apache.org/jira/browse/IMPALA-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796816#comment-17796816
 ] 

Riza Suminto commented on IMPALA-12630:
---------------------------------------

My machine has old dataset, and there is only 1 file in lineitem
{code:java}
00:SCAN HDFS [tpch_orc_def.lineitem]
   HDFS partitions=1/1 files=1 size=142.85MB
   predicates: l_orderkey = CAST(1609411 AS BIGINT)
   stored statistics:
     table: rows=6.00M size=142.85MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=6.00M
   orc statistics predicates: l_orderkey = CAST(1609411 AS BIGINT)
   mem-estimate=176.00MB mem-reservation=4.00MB thread-reservation=1
   tuple-ids=0 row-size=8B cardinality=4
   in pipelines: 00(GETNEXT) {code}
Recent dataload create 4 files instead
{code:java}
00:SCAN HDFS [tpch_orc_def.lineitem]
   HDFS partitions=1/1 files=4 size=142.76MB
   predicates: l_orderkey = CAST(1609411 AS BIGINT)
   stored statistics:
     table: rows=6.00M size=142.76MB
     columns: all
   extrapolated-rows=disabled max-scan-range-rows=1.50M
   orc statistics predicates: l_orderkey = CAST(1609411 AS BIGINT)
   mem-estimate=96.00MB mem-reservation=4.00MB thread-reservation=1
   tuple-ids=0 row-size=8B cardinality=4
   in pipelines: 00(GETNEXT) {code}

> TestOrcStats.test_orc_stats fails in count-start on lineitem with filter
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-12630
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12630
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Priority: Critical
>         Attachments: profile_1134.txt, profile_949.txt
>
>
> Saw the test failed several times recently:
> https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/949
> https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1134
> {noformat}
> query_test/test_orc_stats.py:41: in test_orc_stats
>     self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
> common/impala_test_suite.py:776: in run_test_case
>     update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:683: in verify_runtime_profile
>     % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
> results.
> E   EXPECTED VALUE:
> E   13501
> E   
> E   
> E   ACTUAL VALUE:
> E   20000
> E   
> E   OP:
> E   : {noformat}
> The query is
> {code:sql}
> select count(*) from tpch_orc_def.lineitem where l_orderkey = 1609411
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to