[ 
https://issues.apache.org/jira/browse/IMPALA-10894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420430#comment-17420430
 ] 

Quanlong Huang commented on IMPALA-10894:
-----------------------------------------

Note that {{test_full_acid_original_files}} in tests/query_test/test_acid.py is 
a test guarding this. If we remove this if-branch in 
HdfsOrcScanner::PrepareSearchArguments(): 
[https://github.com/apache/impala/blob/20e07f8645b402f08af91e6b078d1fdd2a0d06f6/be/src/exec/hdfs-orc-scanner.cc#L1056-L1063,]
 we'll get test failure as
{code:java}
select row__id.*, id from alltypes_promoted_nopart
where id < 10;

-- 2021-09-27 09:15:39,556 INFO     MainThread: Started query 
334ebd3b54dd92af:eafa89fd00000000
-- 2021-09-27 09:15:39,704 ERROR    MainThread: Comparing QueryTestResults 
(expected vs actual):
0,0,536870912,4030,0,0 != 0,0,536870912,0,0,0
0,0,536870912,4031,0,1 != 0,0,536870912,1,0,1
0,0,536870912,4032,0,2 != 0,0,536870912,2,0,2
0,0,536870912,4033,0,3 != 0,0,536870912,3,0,3
0,0,536870912,4034,0,4 != 0,0,536870912,4,0,4
0,0,536870912,4035,0,5 != 0,0,536870912,5,0,5
0,0,536870912,4036,0,6 != 0,0,536870912,6,0,6
0,0,536870912,4037,0,7 != 0,0,536870912,7,0,7
0,0,536870912,4038,0,8 != 0,0,536870912,8,0,8
0,0,536870912,4039,0,9 != 0,0,536870912,9,0,9{code}
The row-id column is wrong.

> Pushing down predicates in reading "original files" of ACID tables
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10894
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10894
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> “Original files” don't store special ACID columns. We generate the row id by 
> using the row index of the file. The orc reader doesn't provide interfaces 
> for retrieving the row index of a row in the file. When predicates are pushed 
> down into the orc reader, the returned batch will skip some rows. So we can't 
> calculate the actual row index in file using its index in the batch.
> Currently we skip pushing down predicates in reading such files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to