[
https://issues.apache.org/jira/browse/IMPALA-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796887#comment-17796887
]
Riza Suminto commented on IMPALA-12630:
---------------------------------------
I'm trying to compare dataloading log from pass vs failed run.
In pass run
([^ubuntu-20.04-1158-load-tpch-core-hive-generated-orc-def-block.sql.log]),
lineitem is loaded with 12 Map and 1 File Merge.
In failed run
([^ubuntu-20.04-1134-load-tpch-core-hive-generated-orc-def-block.sql.log]),
lineitem is loaded with 4 Map and no File Merge.
Using orc-tool, I inspect the file metadata of lineitem table from my local
dataset. In [^meta-lineitem.txt]
{code:java}
Stripe 3:
Column 0: count: 533501 hasNull: false
Column 1: count: 533501 hasNull: false bytesOnDisk: 379090 min: 1075520
max: 1609411 sum: 716213269480
...
Stripe 4:
Column 0: count: 533455 hasNull: false
Column 1: count: 533455 hasNull: false bytesOnDisk: 378202 min: 1609411
max: 2142211 sum: 1000577484397 {code}
l_orderkey = 1609411 used by the testcase is in boundary of Stripe 3 and Stripe
4.
One way to make this test more deterministic is to change it to count orders
table with o_orderkey = 1. tpch_orc_def.orders are loaded as single file with 3
stripes in both 1134 and 1158 run. And o_orderkey = 1 only lies in the first
stripe, as shown by [^meta-orders.txt].
{code:java}
Stripe Statistics:
Stripe 1:
Column 0: count: 591839 hasNull: false
Column 1: count: 591839 hasNull: false bytesOnDisk: 4629 min: 1 max:
2367335 sum: 700541773200 {code}
> TestOrcStats.test_orc_stats fails in count-start on lineitem with filter
> ------------------------------------------------------------------------
>
> Key: IMPALA-12630
> URL: https://issues.apache.org/jira/browse/IMPALA-12630
> Project: IMPALA
> Issue Type: Bug
> Reporter: Quanlong Huang
> Priority: Critical
> Attachments: load-tpch-core-hive-generated-orc-def-block.sql,
> meta-lineitem.txt, meta-orders.txt, profile_1134.txt, profile_949.txt,
> ubuntu-20.04-1134-load-tpch-core-hive-generated-orc-def-block.sql.log,
> ubuntu-20.04-1158-load-tpch-core-hive-generated-orc-def-block.sql.log
>
>
> Saw the test failed several times recently:
> https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/949
> https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1134
> {noformat}
> query_test/test_orc_stats.py:41: in test_orc_stats
> self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
> common/impala_test_suite.py:776: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:683: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E AssertionError: Aggregation of SUM over RowsRead did not match expected
> results.
> E EXPECTED VALUE:
> E 13501
> E
> E
> E ACTUAL VALUE:
> E 20000
> E
> E OP:
> E : {noformat}
> The query is
> {code:sql}
> select count(*) from tpch_orc_def.lineitem where l_orderkey = 1609411
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]