[
https://issues.apache.org/jira/browse/IMPALA-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qifan Chen resolved IMPALA-10334.
---------------------------------
Resolution: Fixed
> test_stats_extrapolation output doesn't match on erasure coding build
> ---------------------------------------------------------------------
>
> Key: IMPALA-10334
> URL: https://issues.apache.org/jira/browse/IMPALA-10334
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 4.0
> Reporter: Tim Armstrong
> Assignee: Qifan Chen
> Priority: Blocker
> Labels: broken-build, flaky
>
> {noformat}
> Regression
> metadata.test_stats_extrapolation.TestStatsExtrapolation.test_stats_extrapolation[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> text/none] (from pytest)
> Failing for the past 1 build (Since Failed#621 )
> Took 8.8 sec.
> add description
> Error Message
> metadata/test_stats_extrapolation.py:44: in test_stats_extrapolation
> self.run_test_case('QueryTest/stats-extrapolation', vector, unique_database)
> common/impala_test_suite.py:693: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in __verify_results_and_errors
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in
> verify_raw_results VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results E assert Comparing
> QueryTestResults (expected vs actual): E row_regex:.*Max Per-Host
> Resource Reservation: Memory=.* == 'Max Per-Host Resource Reservation:
> Memory=8.00KB Threads=2' E row_regex:.*Per-Host Resource Estimates:
> Memory=.* == 'Per-Host Resource Estimates: Memory=16MB' E 'Codegen
> disabled by planner' == 'Codegen disabled by planner' E
> row_regex:.*Analyzed query: SELECT id FROM
> test_stats_extrapolation_.*.alltypes.* == 'Analyzed query: SELECT id FROM
> test_stats_extrapolation_5c6bdfd.alltypes' E '' == '' E 'F00:PLAN
> FRAGMENT [UNPARTITIONED] hosts=1 instances=1' == 'F00:PLAN FRAGMENT
> [UNPARTITIONED] hosts=1 instances=1' E row_regex:.*Per-Host Resources:
> mem-estimate=.* mem-reservation=.* == '| Per-Host Resources:
> mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=2' E
> 'PLAN-ROOT SINK' == 'PLAN-ROOT SINK' E '| output exprs: id' == '|
> output exprs: id' E row_regex:.*mem-estimate=.* mem-reservation=.* == '|
> mem-estimate=0B mem-reservation=0B thread-reservation=0' E '|' == '|' E
> '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]' == '00:SCAN HDFS
> [test_stats_extrapolation_5c6bdfd.alltypes]' E
> row_regex:.*partitions=12/12 files=12 size=.* == ' HDFS partitions=12/12
> files=12 size=93.81KB' E ' stored statistics:' != ' erasure coded:
> files=12 size=93.81KB' E row_regex:.*table: rows=3.65K size=.* != '
> stored statistics:' E ' partitions: 0/12 rows=unavailable' != '
> table: rows=3.65K size=93.81KB' E ' columns: all' != '
> partitions: 0/12 rows=unavailable' E row_regex:.* extrapolated-rows=3.65K
> .* != ' columns: all' E row_regex:.*mem-estimate=.*
> mem-reservation=.* != ' extrapolated-rows=3.65K max-scan-range-rows=307' E
> ' tuple-ids=0 row-size=4B cardinality=3.65K' != ' mem-estimate=16.00MB
> mem-reservation=8.00KB thread-reservation=1' E ' in pipelines:
> 00(GETNEXT)' != ' tuple-ids=0 row-size=4B cardinality=3.65K' E None !=
> ' in pipelines: 00(GETNEXT)' E Number of rows returned (expected vs
> actual): 21 != 22
> Stacktrace
> metadata/test_stats_extrapolation.py:44: in test_stats_extrapolation
> self.run_test_case('QueryTest/stats-extrapolation', vector,
> unique_database)
> common/impala_test_suite.py:693: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
> assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E row_regex:.*Max Per-Host Resource Reservation: Memory=.* == 'Max
> Per-Host Resource Reservation: Memory=8.00KB Threads=2'
> E row_regex:.*Per-Host Resource Estimates: Memory=.* == 'Per-Host
> Resource Estimates: Memory=16MB'
> E 'Codegen disabled by planner' == 'Codegen disabled by planner'
> E row_regex:.*Analyzed query: SELECT id FROM
> test_stats_extrapolation_.*.alltypes.* == 'Analyzed query: SELECT id FROM
> test_stats_extrapolation_5c6bdfd.alltypes'
> E '' == ''
> E 'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1' == 'F00:PLAN
> FRAGMENT [UNPARTITIONED] hosts=1 instances=1'
> E row_regex:.*Per-Host Resources: mem-estimate=.* mem-reservation=.* ==
> '| Per-Host Resources: mem-estimate=16.00MB mem-reservation=8.00KB
> thread-reservation=2'
> E 'PLAN-ROOT SINK' == 'PLAN-ROOT SINK'
> E '| output exprs: id' == '| output exprs: id'
> E row_regex:.*mem-estimate=.* mem-reservation=.* == '| mem-estimate=0B
> mem-reservation=0B thread-reservation=0'
> E '|' == '|'
> E '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]' == '00:SCAN
> HDFS [test_stats_extrapolation_5c6bdfd.alltypes]'
> E row_regex:.*partitions=12/12 files=12 size=.* == ' HDFS
> partitions=12/12 files=12 size=93.81KB'
> E ' stored statistics:' != ' erasure coded: files=12 size=93.81KB'
> E row_regex:.*table: rows=3.65K size=.* != ' stored statistics:'
> E ' partitions: 0/12 rows=unavailable' != ' table: rows=3.65K
> size=93.81KB'
> E ' columns: all' != ' partitions: 0/12 rows=unavailable'
> E row_regex:.* extrapolated-rows=3.65K .* != ' columns: all'
> E row_regex:.*mem-estimate=.* mem-reservation=.* != '
> extrapolated-rows=3.65K max-scan-range-rows=307'
> E ' tuple-ids=0 row-size=4B cardinality=3.65K' != '
> mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1'
> E ' in pipelines: 00(GETNEXT)' != ' tuple-ids=0 row-size=4B
> cardinality=3.65K'
> E None != ' in pipelines: 00(GETNEXT)'
> E Number of rows returned (expected vs actual): 21 != 22
> Standard Error
> SET
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2020-10-31 18:50:27,206 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2020-10-31 18:50:27,226 INFO MainThread: Closing active operation
> SET
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_stats_extrapolation_5c6bdfd` CASCADE;
> -- 2020-10-31 18:50:30,980 INFO MainThread: Started query
> 384f0c72b59374cd:cf6e5f9e00000000
> -- 2020-10-31 18:50:30,983 INFO MainThread: Starting new HTTP connection
> (1): 0.0.0.0
> SET
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_stats_extrapolation_5c6bdfd`;
> -- 2020-10-31 18:50:30,996 INFO MainThread: Started query
> a9448b3bd95d84a1:6680ea7800000000
> -- 2020-10-31 18:50:30,998 INFO MainThread: Created database
> "test_stats_extrapolation_5c6bdfd" for test ID
> "metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0,
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False,
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format:
> text/none]"
> SET
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> -- executing against localhost:21000
> use test_stats_extrapolation_5c6bdfd;
> -- 2020-10-31 18:50:31,002 INFO MainThread: Started query
> d847216bd7fae3d5:9af8403900000000
> SET
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> SET explain_level=2;
> SET batch_size=0;
> SET num_nodes=1;
> SET disable_codegen_rows_threshold=5000;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- 2020-10-31 18:50:31,003 INFO MainThread: Loading query test file:
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
> -- 2020-10-31 18:50:31,005 INFO MainThread: Starting new HTTP connection
> (1): localhost
> -- executing against localhost:21000
> create table alltypes sort by (id) like functional_parquet.alltypes;
> -- 2020-10-31 18:50:31,078 INFO MainThread: Started query
> c74e0815ada71327:cc823c3c00000000
> -- executing against localhost:21000
> alter table alltypes set
> tblproperties("impala.enable.stats.extrapolation"="true");
> -- 2020-10-31 18:50:35,014 INFO MainThread: Started query
> c54959345920c672:8db7865200000000
> -- executing against localhost:21000
> insert into alltypes partition(year, month)
> select * from functional_parquet.alltypes where year = 2009;
> -- 2020-10-31 18:50:35,024 INFO MainThread: Started query
> 6d4c89d58f2988bc:b34380fc00000000
> -- executing against localhost:21000
> explain select id from alltypes;
> -- 2020-10-31 18:50:35,437 INFO MainThread: Started query
> 744653d34cf0878b:d54b970900000000
> -- executing against localhost:21000
> SET DISABLE_HDFS_NUM_ROWS_ESTIMATE=1;
> -- 2020-10-31 18:50:35,443 INFO MainThread: Started query
> 5e45825e43253fd0:027e842e00000000
> -- executing against localhost:21000
> explain select id from alltypes;
> -- 2020-10-31 18:50:35,450 INFO MainThread: Started query
> 4e4ca07ae38321a0:d82f0f4900000000
> -- executing against localhost:21000
> SET DISABLE_HDFS_NUM_ROWS_ESTIMATE="0";
> -- 2020-10-31 18:50:35,457 INFO MainThread: Started query
> 9e4b8ce22884bdd8:dc6eefaa00000000
> -- executing against localhost:21000
> compute stats alltypes;
> -- 2020-10-31 18:50:35,463 INFO MainThread: Started query
> 794fc0e9c9141aef:bc6b392c00000000
> -- executing against localhost:21000
> show table stats alltypes;
> -- 2020-10-31 18:50:35,971 INFO MainThread: Started query
> a0462aa89ace75d0:253c408b00000000
> -- executing against localhost:21000
> explain select id from alltypes;
> -- 2020-10-31 18:50:35,980 INFO MainThread: Started query
> 404884ad85bf8458:8549f40a00000000
> -- 2020-10-31 18:50:35,994 ERROR MainThread: Comparing QueryTestResults
> (expected vs actual):
> row_regex:.*Max Per-Host Resource Reservation: Memory=.* == 'Max Per-Host
> Resource Reservation: Memory=8.00KB Threads=2'
> row_regex:.*Per-Host Resource Estimates: Memory=.* == 'Per-Host Resource
> Estimates: Memory=16MB'
> 'Codegen disabled by planner' == 'Codegen disabled by planner'
> row_regex:.*Analyzed query: SELECT id FROM
> test_stats_extrapolation_.*.alltypes.* == 'Analyzed query: SELECT id FROM
> test_stats_extrapolation_5c6bdfd.alltypes'
> '' == ''
> 'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1' == 'F00:PLAN FRAGMENT
> [UNPARTITIONED] hosts=1 instances=1'
> row_regex:.*Per-Host Resources: mem-estimate=.* mem-reservation=.* == '|
> Per-Host Resources: mem-estimate=16.00MB mem-reservation=8.00KB
> thread-reservation=2'
> 'PLAN-ROOT SINK' == 'PLAN-ROOT SINK'
> '| output exprs: id' == '| output exprs: id'
> row_regex:.*mem-estimate=.* mem-reservation=.* == '| mem-estimate=0B
> mem-reservation=0B thread-reservation=0'
> '|' == '|'
> '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]' == '00:SCAN HDFS
> [test_stats_extrapolation_5c6bdfd.alltypes]'
> row_regex:.*partitions=12/12 files=12 size=.* == ' HDFS partitions=12/12
> files=12 size=93.81KB'
> ' stored statistics:' != ' erasure coded: files=12 size=93.81KB'
> row_regex:.*table: rows=3.65K size=.* != ' stored statistics:'
> ' partitions: 0/12 rows=unavailable' != ' table: rows=3.65K
> size=93.81KB'
> ' columns: all' != ' partitions: 0/12 rows=unavailable'
> row_regex:.* extrapolated-rows=3.65K .* != ' columns: all'
> row_regex:.*mem-estimate=.* mem-reservation=.* != ' extrapolated-rows=3.65K
> max-scan-range-rows=307'
> ' tuple-ids=0 row-size=4B cardinality=3.65K' != ' mem-estimate=16.00MB
> mem-reservation=8.00KB thread-reservation=1'
> ' in pipelines: 00(GETNEXT)' != ' tuple-ids=0 row-size=4B
> cardinality=3.65K'
> None != ' in pipelines: 00(GETNEXT)'
> Number of rows returned (expected vs actual): 21 != 22
> {noformat}
> IMPALA-7097 added the extra line here: ' erasure coded: files=12
> size=93.81KB'
> It might be OK to just skip this since it's not directly related to the
> erasure coding functionality.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]