[
https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583064#comment-16583064
]
Tim Armstrong commented on IMPALA-7335:
---------------------------------------
I would probably look at the pieces of state currently used for synchronisation
between threads and figure out what the possible states and state transitions
are currently and what needs to happen during the state transitions (e.g. a
thread blocked on batch_queue_ needs to be woken up on some transitions). Then
we can probably figure out a simpler set of states without the incorrect
transition that is currently possible. It looks like the relevant state is:
* HdfsScanNodeBase::progress_
* HdfsScanNode::done_
* HdfsScanNode::status_
* HdfsScanNode::thread_state_::batch_queue_::shutdown_
We should also look at KuduScanNode since it might have the same bug.
> Assertion Failure - test_corrupt_files
> --------------------------------------
>
> Key: IMPALA-7335
> URL: https://issues.apache.org/jira/browse/IMPALA-7335
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 3.1.0
> Reporter: nithya
> Assignee: Pooja Nilangekar
> Priority: Blocker
> Labels: broken-build
>
> test_corrupt_files fails
>
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option:
> \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None,
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from
> pytest)
>
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files
> self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case assert False, "Expected
> exception: %s" % expected_str E AssertionError: Expected exception: Column
> metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
> self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E AssertionError: Expected exception: Column metadata states there are 11
> values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]