[
https://issues.apache.org/jira/browse/IMPALA-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-3773.
-----------------------------------
Resolution: Won't Fix
> Invalid parquet files can fail query when abort_on_error=0
> ----------------------------------------------------------
>
> Key: IMPALA-3773
> URL: https://issues.apache.org/jira/browse/IMPALA-3773
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.7.0
> Reporter: Tim Armstrong
> Priority: Minor
>
> Typically when encountering invalid files and abort_on_error=0, Impala will
> skip over the invalid file, log an error, and continue executing the query.
> This is not consistently done with Parquet. I was able to reproduce a few
> cases with randomly corrupted input files:
> https://gerrit.cloudera.org/#/c/3448/. See the snippet below for some
> examples.
> This may be the intended behaviour. If it is, we can close as won't fix.
> {code}
> XFAIL
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_nested_types[exec_option:
> {'disable_codegen': True, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} |
> table_format: parquet/none]
> reason: Should not throw error when abort_on_error=0:
> 'ImpalaBeeswaxException:
> Query aborted:
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_a124ee37.db/alltypes/copy7_nullable.parq'
> has an invalid version number: �
> This could be due to stale metadata. Try running "refresh
> test_fuzz_nested_types_a124ee37.alltypes".
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_a124ee37.db/alltypes/copy7_nullable.parq'
> has an invalid version number: �
> This could be due to stale metadata. Try running "refresh
> test_fuzz_nested_types_a124ee37.alltypes".
> '
> XFAIL
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[exec_option:
> {'disable_codegen': True, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} |
> table_format: parquet/none]
> reason: Should not throw error when abort_on_error=0:
> 'ImpalaBeeswaxException:
> Query aborted:
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_dde29f4.db/alltypes/d6=1/copy1_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
> has an invalid version number: bf04
> This could be due to stale metadata. Try running "refresh
> test_fuzz_decimal_tbl_dde29f4.alltypes".
> Parquet file
> hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_dde29f4.db/alltypes/d6=1/copy9_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq
> has an invalid file length: 1
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_dde29f4.db/alltypes/d6=1/copy1_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
> has an invalid version number: bf04
> This could be due to stale metadata. Try running "refresh
> test_fuzz_decimal_tbl_dde29f4.alltypes".
> '
> XFAIL
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_decimal_tbl[exec_option:
> {'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} |
> table_format: parquet/none]
> reason: Should not throw error when abort_on_error=0:
> 'ImpalaBeeswaxException:
> Query aborted:
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_670ea0a5.db/alltypes/d6=1/copy8_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
> has an invalid version number:
> This could be due to stale metadata. Try running "refresh
> test_fuzz_decimal_tbl_670ea0a5.alltypes".
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_670ea0a5.db/alltypes/d6=1/7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
> is corrupt: unexpected encoding: for data page of column 'd1'.
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_decimal_tbl_670ea0a5.db/alltypes/d6=1/copy8_7b41c28162963cd7-15937dcd2ee1baaf_267443683_data.0.parq'
> has an invalid version number:
> This could be due to stale metadata. Try running "refresh
> test_fuzz_decimal_tbl_670ea0a5.alltypes".
> '
> XFAIL
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_alltypes[exec_option:
> {'disable_codegen': True, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} |
> table_format: parquet/none]
> reason: Should not throw error when abort_on_error=0:
> 'ImpalaBeeswaxException:
> Query aborted:
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_2e6ebdd4.db/alltypes/year=2009/month=4/4742d0e705c3f84b-f6d8b21d0c737a85_1049219845_data.0.parq'
> has an invalid version number: �
> This could be due to stale metadata. Try running "refresh
> test_fuzz_alltypes_2e6ebdd4.alltypes".
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_2e6ebdd4.db/alltypes/year=2010/month=4/4742d0e705c3f84b-f6d8b21d0c737a85_680603080_data.0.parq'
> has an invalid version number:
> This could be due to stale metadata. Try running "refresh
> test_fuzz_alltypes_2e6ebdd4.alltypes". (1 of 2 similar)
> '
> XFAIL
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_nested_types[exec_option:
> {'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} |
> table_format: parquet/none]
> reason: Should not throw error when abort_on_error=0:
> 'ImpalaBeeswaxException:
> Query aborted:
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_65cfc968.db/alltypes/copy2_nonnullable.parq'
> has an invalid version number: ","i
> This could be due to stale metadata. Try running "refresh
> test_fuzz_nested_types_65cfc968.alltypes".
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_nested_types_65cfc968.db/alltypes/copy2_nonnullable.parq'
> has an invalid version number: ","i
> This could be due to stale metadata. Try running "refresh
> test_fuzz_nested_types_65cfc968.alltypes".
> '
> XFAIL
> tests/query_test/test_scanners.py::TestScannersFuzzing::()::test_fuzz_alltypes[exec_option:
> {'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} |
> table_format: parquet/none]
> reason: Should not throw error when abort_on_error=0:
> 'ImpalaBeeswaxException:
> Query aborted:
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_5c4330f9.db/alltypes/year=2009/month=6/4742d0e705c3f84b-f6d8b21d0c737a85_1874799227_data.0.parq'
> has an invalid version number: �@
> This could be due to stale metadata. Try running "refresh
> test_fuzz_alltypes_5c4330f9.alltypes".
> File
> 'hdfs://localhost:20500/test-warehouse/test_fuzz_alltypes_5c4330f9.db/alltypes/year=2009/month=6/4742d0e705c3f84b-f6d8b21d0c737a85_1874799227_data.0.parq'
> has an invalid version number: �@
> This could be due to stale metadata. Try running "refresh
> test_fuzz_alltypes_5c4330f9.alltypes".
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]