[
https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582889#comment-16582889
]
Pooja Nilangekar commented on IMPALA-7335:
------------------------------------------
Here is the log corresponding to the query failure:
{code:java}
I0816 00:20:01.948418 6716 hdfs-scan-node.cc:336] Non-ok status returned by
ProcessSplit = Error converting column: 0 to TINYINT for Scan node (id=0,
status_ = ok, done_ = 1)
{code}
So the theory discussed earlier was partially correct. The done flag was
already set to true when ProcessSplit() returned. (This is because the
RangeComplete() function is called from ProcessSplit() even when the Scanner
runs into an error). In other cases where the test didn't fail, the done_ flag
was set to false. *However, the status_ is not set to canceled.* (I still need
to understand why).
There is another potential issue with the HdfsScanNode. In the case the
scanners encounter errors, they still enqueue the RowBatch into the
batch_queue_ (which seems alright). However, HdfsScanNode::GetNextInternal()
calls the GetBatch() function and in case it acquires a non null RowBatch, it
returns Status::OK() without checking the status_ of the ScanNode. A caveat
here is that even if the function were to check the status_, it could still see
an Ok status because the scanner thread updates the status only after
ProcessSplit() returns. (Yet it seems semantically incorrect to return an OK
status without inspecting the status_ variable.)
I think delaying the calls to RangeComplete() alone might not fix the problem
because GetNextInternal() could still return an OK status for a scan node which
has already encountered an error.
[~tarmstrong] [~bikramjeet.vig] What would be a clean approach to handle this
issue?
> Assertion Failure - test_corrupt_files
> --------------------------------------
>
> Key: IMPALA-7335
> URL: https://issues.apache.org/jira/browse/IMPALA-7335
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 3.1.0
> Reporter: nithya
> Assignee: Pooja Nilangekar
> Priority: Blocker
> Labels: broken-build
>
> test_corrupt_files fails
>
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option:
> \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None,
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from
> pytest)
>
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files
> self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case assert False, "Expected
> exception: %s" % expected_str E AssertionError: Expected exception: Column
> metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
> self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
> assert False, "Expected exception: %s" % expected_str
> E AssertionError: Expected exception: Column metadata states there are 11
> values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]