[ 
https://issues.apache.org/jira/browse/IMPALA-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qifan Chen resolved IMPALA-10334.
---------------------------------
    Resolution: Fixed

> test_stats_extrapolation output doesn't match on erasure coding build
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-10334
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10334
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0
>            Reporter: Tim Armstrong
>            Assignee: Qifan Chen
>            Priority: Blocker
>              Labels: broken-build, flaky
>
> {noformat}
> Regression
> metadata.test_stats_extrapolation.TestStatsExtrapolation.test_stats_extrapolation[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] (from pytest)
> Failing for the past 1 build (Since Failed#621 )
> Took 8.8 sec.
> add description
> Error Message
> metadata/test_stats_extrapolation.py:44: in test_stats_extrapolation     
> self.run_test_case('QueryTest/stats-extrapolation', vector, unique_database) 
> common/impala_test_suite.py:693: in run_test_case     
> self.__verify_results_and_errors(vector, test_section, result, use_db) 
> common/impala_test_suite.py:529: in __verify_results_and_errors     
> replace_filenames_with_placeholder) common/test_result_verifier.py:456: in 
> verify_raw_results     VERIFIER_MAP[verifier](expected, actual) 
> common/test_result_verifier.py:278: in verify_query_result_is_equal     
> assert expected_results == actual_results E   assert Comparing 
> QueryTestResults (expected vs actual): E     row_regex:.*Max Per-Host 
> Resource Reservation: Memory=.* == 'Max Per-Host Resource Reservation: 
> Memory=8.00KB Threads=2' E     row_regex:.*Per-Host Resource Estimates: 
> Memory=.* == 'Per-Host Resource Estimates: Memory=16MB' E     'Codegen 
> disabled by planner' == 'Codegen disabled by planner' E     
> row_regex:.*Analyzed query: SELECT id FROM 
> test_stats_extrapolation_.*.alltypes.* == 'Analyzed query: SELECT id FROM 
> test_stats_extrapolation_5c6bdfd.alltypes' E     '' == '' E     'F00:PLAN 
> FRAGMENT [UNPARTITIONED] hosts=1 instances=1' == 'F00:PLAN FRAGMENT 
> [UNPARTITIONED] hosts=1 instances=1' E     row_regex:.*Per-Host Resources: 
> mem-estimate=.* mem-reservation=.* == '|  Per-Host Resources: 
> mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=2' E     
> 'PLAN-ROOT SINK' == 'PLAN-ROOT SINK' E     '|  output exprs: id' == '|  
> output exprs: id' E     row_regex:.*mem-estimate=.* mem-reservation=.* == '|  
> mem-estimate=0B mem-reservation=0B thread-reservation=0' E     '|' == '|' E   
>   '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]' == '00:SCAN HDFS 
> [test_stats_extrapolation_5c6bdfd.alltypes]' E     
> row_regex:.*partitions=12/12 files=12 size=.* == '   HDFS partitions=12/12 
> files=12 size=93.81KB' E     '   stored statistics:' != '   erasure coded: 
> files=12 size=93.81KB' E     row_regex:.*table: rows=3.65K size=.* != '   
> stored statistics:' E     '     partitions: 0/12 rows=unavailable' != '     
> table: rows=3.65K size=93.81KB' E     '     columns: all' != '     
> partitions: 0/12 rows=unavailable' E     row_regex:.* extrapolated-rows=3.65K 
> .* != '     columns: all' E     row_regex:.*mem-estimate=.* 
> mem-reservation=.* != '   extrapolated-rows=3.65K max-scan-range-rows=307' E  
>    '   tuple-ids=0 row-size=4B cardinality=3.65K' != '   mem-estimate=16.00MB 
> mem-reservation=8.00KB thread-reservation=1' E     '   in pipelines: 
> 00(GETNEXT)' != '   tuple-ids=0 row-size=4B cardinality=3.65K' E     None != 
> '   in pipelines: 00(GETNEXT)' E     Number of rows returned (expected vs 
> actual): 21 != 22
> Stacktrace
> metadata/test_stats_extrapolation.py:44: in test_stats_extrapolation
>     self.run_test_case('QueryTest/stats-extrapolation', vector, 
> unique_database)
> common/impala_test_suite.py:693: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:529: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
>     assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E     row_regex:.*Max Per-Host Resource Reservation: Memory=.* == 'Max 
> Per-Host Resource Reservation: Memory=8.00KB Threads=2'
> E     row_regex:.*Per-Host Resource Estimates: Memory=.* == 'Per-Host 
> Resource Estimates: Memory=16MB'
> E     'Codegen disabled by planner' == 'Codegen disabled by planner'
> E     row_regex:.*Analyzed query: SELECT id FROM 
> test_stats_extrapolation_.*.alltypes.* == 'Analyzed query: SELECT id FROM 
> test_stats_extrapolation_5c6bdfd.alltypes'
> E     '' == ''
> E     'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1' == 'F00:PLAN 
> FRAGMENT [UNPARTITIONED] hosts=1 instances=1'
> E     row_regex:.*Per-Host Resources: mem-estimate=.* mem-reservation=.* == 
> '|  Per-Host Resources: mem-estimate=16.00MB mem-reservation=8.00KB 
> thread-reservation=2'
> E     'PLAN-ROOT SINK' == 'PLAN-ROOT SINK'
> E     '|  output exprs: id' == '|  output exprs: id'
> E     row_regex:.*mem-estimate=.* mem-reservation=.* == '|  mem-estimate=0B 
> mem-reservation=0B thread-reservation=0'
> E     '|' == '|'
> E     '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]' == '00:SCAN 
> HDFS [test_stats_extrapolation_5c6bdfd.alltypes]'
> E     row_regex:.*partitions=12/12 files=12 size=.* == '   HDFS 
> partitions=12/12 files=12 size=93.81KB'
> E     '   stored statistics:' != '   erasure coded: files=12 size=93.81KB'
> E     row_regex:.*table: rows=3.65K size=.* != '   stored statistics:'
> E     '     partitions: 0/12 rows=unavailable' != '     table: rows=3.65K 
> size=93.81KB'
> E     '     columns: all' != '     partitions: 0/12 rows=unavailable'
> E     row_regex:.* extrapolated-rows=3.65K .* != '     columns: all'
> E     row_regex:.*mem-estimate=.* mem-reservation=.* != '   
> extrapolated-rows=3.65K max-scan-range-rows=307'
> E     '   tuple-ids=0 row-size=4B cardinality=3.65K' != '   
> mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1'
> E     '   in pipelines: 00(GETNEXT)' != '   tuple-ids=0 row-size=4B 
> cardinality=3.65K'
> E     None != '   in pipelines: 00(GETNEXT)'
> E     Number of rows returned (expected vs actual): 21 != 22
> Standard Error
> SET 
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2020-10-31 18:50:27,206 INFO     MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2020-10-31 18:50:27,226 INFO     MainThread: Closing active operation
> SET 
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> SET sync_ddl=False;
> -- executing against localhost:21000
> DROP DATABASE IF EXISTS `test_stats_extrapolation_5c6bdfd` CASCADE;
> -- 2020-10-31 18:50:30,980 INFO     MainThread: Started query 
> 384f0c72b59374cd:cf6e5f9e00000000
> -- 2020-10-31 18:50:30,983 INFO     MainThread: Starting new HTTP connection 
> (1): 0.0.0.0
> SET 
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> SET sync_ddl=False;
> -- executing against localhost:21000
> CREATE DATABASE `test_stats_extrapolation_5c6bdfd`;
> -- 2020-10-31 18:50:30,996 INFO     MainThread: Started query 
> a9448b3bd95d84a1:6680ea7800000000
> -- 2020-10-31 18:50:30,998 INFO     MainThread: Created database 
> "test_stats_extrapolation_5c6bdfd" for test ID 
> "metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none]"
> SET 
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> -- executing against localhost:21000
> use test_stats_extrapolation_5c6bdfd;
> -- 2020-10-31 18:50:31,002 INFO     MainThread: Started query 
> d847216bd7fae3d5:9af8403900000000
> SET 
> client_identifier=metadata/test_stats_extrapolation.py::TestStatsExtrapolation::()::test_stats_extrapolation[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':5000;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_t;
> SET explain_level=2;
> SET batch_size=0;
> SET num_nodes=1;
> SET disable_codegen_rows_threshold=5000;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- 2020-10-31 18:50:31,003 INFO     MainThread: Loading query test file: 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
> -- 2020-10-31 18:50:31,005 INFO     MainThread: Starting new HTTP connection 
> (1): localhost
> -- executing against localhost:21000
> create table alltypes sort by (id) like functional_parquet.alltypes;
> -- 2020-10-31 18:50:31,078 INFO     MainThread: Started query 
> c74e0815ada71327:cc823c3c00000000
> -- executing against localhost:21000
> alter table alltypes set 
> tblproperties("impala.enable.stats.extrapolation"="true");
> -- 2020-10-31 18:50:35,014 INFO     MainThread: Started query 
> c54959345920c672:8db7865200000000
> -- executing against localhost:21000
> insert into alltypes partition(year, month)
> select * from functional_parquet.alltypes where year = 2009;
> -- 2020-10-31 18:50:35,024 INFO     MainThread: Started query 
> 6d4c89d58f2988bc:b34380fc00000000
> -- executing against localhost:21000
> explain select id from alltypes;
> -- 2020-10-31 18:50:35,437 INFO     MainThread: Started query 
> 744653d34cf0878b:d54b970900000000
> -- executing against localhost:21000
> SET DISABLE_HDFS_NUM_ROWS_ESTIMATE=1;
> -- 2020-10-31 18:50:35,443 INFO     MainThread: Started query 
> 5e45825e43253fd0:027e842e00000000
> -- executing against localhost:21000
> explain select id from alltypes;
> -- 2020-10-31 18:50:35,450 INFO     MainThread: Started query 
> 4e4ca07ae38321a0:d82f0f4900000000
> -- executing against localhost:21000
> SET DISABLE_HDFS_NUM_ROWS_ESTIMATE="0";
> -- 2020-10-31 18:50:35,457 INFO     MainThread: Started query 
> 9e4b8ce22884bdd8:dc6eefaa00000000
> -- executing against localhost:21000
> compute stats alltypes;
> -- 2020-10-31 18:50:35,463 INFO     MainThread: Started query 
> 794fc0e9c9141aef:bc6b392c00000000
> -- executing against localhost:21000
> show table stats alltypes;
> -- 2020-10-31 18:50:35,971 INFO     MainThread: Started query 
> a0462aa89ace75d0:253c408b00000000
> -- executing against localhost:21000
> explain select id from alltypes;
> -- 2020-10-31 18:50:35,980 INFO     MainThread: Started query 
> 404884ad85bf8458:8549f40a00000000
> -- 2020-10-31 18:50:35,994 ERROR    MainThread: Comparing QueryTestResults 
> (expected vs actual):
> row_regex:.*Max Per-Host Resource Reservation: Memory=.* == 'Max Per-Host 
> Resource Reservation: Memory=8.00KB Threads=2'
> row_regex:.*Per-Host Resource Estimates: Memory=.* == 'Per-Host Resource 
> Estimates: Memory=16MB'
> 'Codegen disabled by planner' == 'Codegen disabled by planner'
> row_regex:.*Analyzed query: SELECT id FROM 
> test_stats_extrapolation_.*.alltypes.* == 'Analyzed query: SELECT id FROM 
> test_stats_extrapolation_5c6bdfd.alltypes'
> '' == ''
> 'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1' == 'F00:PLAN FRAGMENT 
> [UNPARTITIONED] hosts=1 instances=1'
> row_regex:.*Per-Host Resources: mem-estimate=.* mem-reservation=.* == '|  
> Per-Host Resources: mem-estimate=16.00MB mem-reservation=8.00KB 
> thread-reservation=2'
> 'PLAN-ROOT SINK' == 'PLAN-ROOT SINK'
> '|  output exprs: id' == '|  output exprs: id'
> row_regex:.*mem-estimate=.* mem-reservation=.* == '|  mem-estimate=0B 
> mem-reservation=0B thread-reservation=0'
> '|' == '|'
> '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]' == '00:SCAN HDFS 
> [test_stats_extrapolation_5c6bdfd.alltypes]'
> row_regex:.*partitions=12/12 files=12 size=.* == '   HDFS partitions=12/12 
> files=12 size=93.81KB'
> '   stored statistics:' != '   erasure coded: files=12 size=93.81KB'
> row_regex:.*table: rows=3.65K size=.* != '   stored statistics:'
> '     partitions: 0/12 rows=unavailable' != '     table: rows=3.65K 
> size=93.81KB'
> '     columns: all' != '     partitions: 0/12 rows=unavailable'
> row_regex:.* extrapolated-rows=3.65K .* != '     columns: all'
> row_regex:.*mem-estimate=.* mem-reservation=.* != '   extrapolated-rows=3.65K 
> max-scan-range-rows=307'
> '   tuple-ids=0 row-size=4B cardinality=3.65K' != '   mem-estimate=16.00MB 
> mem-reservation=8.00KB thread-reservation=1'
> '   in pipelines: 00(GETNEXT)' != '   tuple-ids=0 row-size=4B 
> cardinality=3.65K'
> None != '   in pipelines: 00(GETNEXT)'
> Number of rows returned (expected vs actual): 21 != 22
> {noformat}
> IMPALA-7097 added the extra line here: '   erasure coded: files=12 
> size=93.81KB'
> It might be OK to just skip this since it's not directly related to the 
> erasure coding functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to