[ 
https://issues.apache.org/jira/browse/IMPALA-10894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420433#comment-17420433
 ] 

Quanlong Huang commented on IMPALA-10894:
-----------------------------------------

The ORC lib actually provides an interface to retrieve the row number in file 
of the first row in previous returned batch:
https://github.com/apache/orc/blob/rel/release-1.7.0/c++/include/orc/Reader.hh#L560
{code:cpp}
    /**
     * Get the row number of the first row in the previously read batch.
     * @return the row number of the previous batch.
     */
    virtual uint64_t getRowNumber() const = 0;
{code}
We can call orc::RowReader::next() to read the batch and then use 
orc::RowReader::getRowNumber() to get the first row id of the batch.
The implementation of SearchArgument(predicate pushdown) ensures that rows in a 
batch are consecutive:
https://github.com/apache/orc/blob/rel/release-1.7.0/c%2B%2B/src/Reader.cc#L1073-L1094

> Pushing down predicates in reading "original files" of ACID tables
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10894
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10894
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> “Original files” don't store special ACID columns. We generate the row id by 
> using the row index of the file. The orc reader doesn't provide interfaces 
> for retrieving the row index of a row in the file. When predicates are pushed 
> down into the orc reader, the returned batch will skip some rows. So we can't 
> calculate the actual row index in file using its index in the batch.
> Currently we skip pushing down predicates in reading such files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to