[ 
https://issues.apache.org/jira/browse/IMPALA-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-13887.
------------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

> TestParquet.test_resolution_by_name fails with tuple caching enabled
> --------------------------------------------------------------------
>
>                 Key: IMPALA-13887
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13887
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>             Fix For: Impala 5.0.0
>
>
> When running TestParquet.test_resolution_by_name with tuple caching enabled, 
> it fails with a correctness issue:
> {noformat}
>  TestParquet.test_resolution_by_name[protocol: beeswax | table_format: 
> parquet/none | exec_option: {'test_replan': 1, 'batch_size': 0, 'num_nodes': 
> 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': 
> 'HDFS_SCANNER_THREAD_CHECK_SOFT_MEM_LIMIT:FAIL@0.5', 
> 'exec_single_node_rows_threshold': 0}] 
> [gw0] linux2 -- Python 2.7.16 
> /home/joemcdonnell/upstream/Impala/bin/../infra/python/env-gcc10.4.0/bin/python
> query_test/test_scanners.py:1052: in test_resolution_by_name
>     use_db=unique_database)
> common/impala_test_suite.py:904: in run_test_case
>     self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:737: in __verify_results_and_errors
>     replace_filenames_with_placeholder)
> common/test_result_verifier.py:523: in verify_raw_results
>     VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:305: in verify_query_result_is_equal
>     assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E     'NULL' == 'NULL'
> E     'NULL' == 'NULL'
> E     'NULL' == 'NULL'
> E     'NULL' == 'NULL'
> E     'NULL' == 'NULL'
> E     'NULL' != 'aaa'
> E     'NULL' != 'aaa'
> E     'NULL' != 'bbb'
> E     'NULL' != 'bbb'
> E     'NULL' != 'c'
> E     'NULL' != 'c'
> E     'NULL' != 'nonnullable'
> {noformat}
> The test alters a table to change the name of a column, which actually 
> changes the meaning of the statement when using 
> parquet_fallback_schema_resolution=name. The issue is that the cache key 
> doesn't contain the actual column names. These are the SQLs:
> {noformat}
> select tmp.f from nested_resolution_by_name_test.nested_struct.c.d.item tmp;
> # Renames 'f' to 'renamed'
> alter table nested_resolution_by_name_test change nested_struct nested_struct
> struct<b: array<int>, a: int, c: struct<d: array<array<struct<renamed: 
> string>>>>>;
> select tmp.renamed from nested_resolution_by_name_test.nested_struct.c.d.item 
> tmp;{noformat}
> The cache key should incorporate the column/field names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to