[ 
https://issues.apache.org/jira/browse/IMPALA-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3773.
-----------------------------------
    Resolution: Won't Fix

> Invalid parquet files can fail query when abort_on_error=0
> ----------------------------------------------------------
>
>                 Key: IMPALA-3773
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3773
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.7.0
>            Reporter: Tim Armstrong
>            Priority: Minor
>
> Typically when encountering invalid files and abort_on_error=0, Impala will 
> skip over the invalid file, log an error, and continue executing the query. 
> This is not consistently done with Parquet. I was able to reproduce a few 
> cases with randomly corrupted input files: 
> https://gerrit.cloudera.org/#/c/3448/. See the snippet below for some 
> examples.
> This may be the intended behaviour. If it is, we can close as won't fix.
> {code}
> XFAIL 
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_nested_types[exec_option:
>  {'disable_codegen': True, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | 
> table_format: parquet/none]
>   reason: Should not throw error when abort_on_error=0: 
> 'ImpalaBeeswaxException:
>  Query aborted:
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_a124ee37.db/alltypes/copy7_nullable.parq'
>  has an invalid version number: �
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_nested_types_a124ee37.alltypes".
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_a124ee37.db/alltypes/copy7_nullable.parq'
>  has an invalid version number: �
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_nested_types_a124ee37.alltypes".
> '
> XFAIL 
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[exec_option:
>  {'disable_codegen': True, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | 
> table_format: parquet/none]
>   reason: Should not throw error when abort_on_error=0: 
> 'ImpalaBeeswaxException:
>  Query aborted:
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_dde29f4.db/alltypes/d6=1/copy1_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
>  has an invalid version number: bf04
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_decimal_tbl_dde29f4.alltypes".
> Parquet file 
> hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_dde29f4.db/alltypes/d6=1/copy9_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq
>  has an invalid file length: 1
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_dde29f4.db/alltypes/d6=1/copy1_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
>  has an invalid version number: bf04
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_decimal_tbl_dde29f4.alltypes".
> '
> XFAIL 
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[exec_option:
>  {'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | 
> table_format: parquet/none]
>   reason: Should not throw error when abort_on_error=0: 
> 'ImpalaBeeswaxException:
>  Query aborted:
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_670ea0a5.db/alltypes/d6=1/copy8_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
>  has an invalid version number:
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_decimal_tbl_670ea0a5.alltypes".
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_670ea0a5.db/alltypes/d6=1/7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
>  is corrupt: unexpected encoding:  for data page of column 'd1'.
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_670ea0a5.db/alltypes/d6=1/copy8_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
>  has an invalid version number:
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_decimal_tbl_670ea0a5.alltypes".
> '
> XFAIL 
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_alltypes[exec_option:
>  {'disable_codegen': True, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | 
> table_format: parquet/none]
>   reason: Should not throw error when abort_on_error=0: 
> 'ImpalaBeeswaxException:
>  Query aborted:
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_2e6ebdd4.db/alltypes/year=2009/month=4/4742d0e705c3f84b-f6d8b21d0c737a85_1049219845_data.0.parq'
>  has an invalid version number: �
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_alltypes_2e6ebdd4.alltypes".
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_2e6ebdd4.db/alltypes/year=2010/month=4/4742d0e705c3f84b-f6d8b21d0c737a85_680603080_data.0.parq'
>  has an invalid version number:
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_alltypes_2e6ebdd4.alltypes". (1 of 2 similar)
> '
> XFAIL 
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_nested_types[exec_option:
>  {'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | 
> table_format: parquet/none]   
>   reason: Should not throw error when abort_on_error=0: 
> 'ImpalaBeeswaxException:
>  Query aborted:
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_65cfc968.db/alltypes/copy2_nonnullable.parq'
>  has an invalid version number: ","i
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_nested_types_65cfc968.alltypes".
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_65cfc968.db/alltypes/copy2_nonnullable.parq'
>  has an invalid version number: ","i
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_nested_types_65cfc968.alltypes".
> '
> XFAIL 
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_alltypes[exec_option:
>  {'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | 
> table_format: parquet/none]
>   reason: Should not throw error when abort_on_error=0: 
> 'ImpalaBeeswaxException:
>  Query aborted:
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_5c4330f9.db/alltypes/year=2009/month=6/4742d0e705c3f84b-f6d8b21d0c737a85_1874799227_data.0.parq'
>  has an invalid version number: �@
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_alltypes_5c4330f9.alltypes".
> File 
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_5c4330f9.db/alltypes/year=2009/month=6/4742d0e705c3f84b-f6d8b21d0c737a85_1874799227_data.0.parq'
>  has an invalid version number: �@
> This could be due to stale metadata. Try running "refresh 
> test_fuzz_alltypes_5c4330f9.alltypes".
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to