[ 
https://issues.apache.org/jira/browse/IMPALA-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582889#comment-16582889
 ] 

Pooja Nilangekar commented on IMPALA-7335:
------------------------------------------

Here is the log corresponding to the query failure:

 
{code:java}
I0816 00:20:01.948418  6716 hdfs-scan-node.cc:336] Non-ok status returned by 
ProcessSplit = Error converting column: 0 to TINYINT for Scan node (id=0, 
status_ = ok, done_ = 1)
{code}
So the theory discussed earlier was partially correct. The done flag was 
already set to true when ProcessSplit() returned. (This is because the 
RangeComplete() function is called from ProcessSplit() even when the Scanner 
runs into an error). In other cases where the test didn't fail, the done_ flag 
was set to false. *However, the status_ is not set to canceled.* (I still need 
to understand why).

There is  another potential issue with the HdfsScanNode. In the case the 
scanners encounter errors, they still enqueue the RowBatch into the 
batch_queue_ (which seems alright). However, HdfsScanNode::GetNextInternal() 
calls the GetBatch() function and in case it acquires a non null RowBatch, it 
returns Status::OK() without checking the status_ of the ScanNode. A caveat 
here is that even if the function were to check the status_, it could still see 
an Ok status because the scanner thread updates the status only after 
ProcessSplit() returns. (Yet it seems semantically incorrect to return an OK 
status without inspecting the status_ variable.)

I think delaying the calls to RangeComplete() alone might not fix the problem 
because GetNextInternal() could still return an OK status for a scan node which 
has already encountered an error.

[~tarmstrong] [~bikramjeet.vig] What would be a clean approach to handle this 
issue? 

 

> Assertion Failure - test_corrupt_files
> --------------------------------------
>
>                 Key: IMPALA-7335
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7335
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.1.0
>            Reporter: nithya
>            Assignee: Pooja Nilangekar
>            Priority: Blocker
>              Labels: broken-build
>
> test_corrupt_files fails 
>  
> query_test.test_scanners.TestParquet.test_corrupt_files[exec_option: 
> \\{'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] (from 
> pytest)
>  
> {code:java}
> Error Message
> query_test/test_scanners.py:300: in test_corrupt_files     
> self.run_test_case('QueryTest/parquet-abort-on-error', vector) 
> common/impala_test_suite.py:420: in run_test_case     assert False, "Expected 
> exception: %s" % expected_str E   AssertionError: Expected exception: Column 
> metadata states there are 11 values, but read 10 values from column id.
> STACKTRACE
> query_test/test_scanners.py:300: in test_corrupt_files
>     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
> common/impala_test_suite.py:420: in run_test_case
>     assert False, "Expected exception: %s" % expected_str
> E   AssertionError: Expected exception: Column metadata states there are 11 
> values, but read 10 values from column id.
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=0;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_negative_len;
> -- executing against localhost:21000
> SELECT * from bad_parquet_strings_out_of_bounds;
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id, cnt from bad_column_metadata t, (select count(*) cnt from 
> t.int_array) v;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> -- executing against localhost:21000
> set num_nodes=1;
> -- executing against localhost:21000
> set num_scanner_threads=1;
> -- executing against localhost:21000
> select id from bad_column_metadata;
> -- executing against localhost:21000
> SET NUM_NODES="0";
> -- executing against localhost:21000
> SET NUM_SCANNER_THREADS="0";
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to