[ 
https://issues.apache.org/jira/browse/IMPALA-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577017#comment-17577017
 ] 

ASF subversion and git services commented on IMPALA-11208:
----------------------------------------------------------

Commit 4879f49bb59f944d846650b0b06e1abb95c73646 in impala's branch 
refs/heads/branch-4.1.1 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4879f49bb ]

IMPALA-11208: Fix uninitialized counter of CollectionItemsRead in orc-scanner

CollectionItemsRead in the runtime profile counts the total number of
nested collection items read by the scan node. Only created for scans
that support nested types, e.g. Parquet or ORC.

Each scanner thread maintains its local counter and merges it into
HdfsScanNode counter for each row batch. However, the local counter in
orc-scanner is uninitialized, leading to weird values. This patch simply
initializes it to 0 and adds test coverage.

Tests:
Add profile verification for this counter on some existing query tests.
Note that there are some implementation difference between Parquet and
ORC scanners (e.g. in predicate pushdown). So we will see different
counter results in some query. I just pick some queries that have
consistent counters.

Change-Id: Id7783d1460ac9b98e94d3a31028b43f5a9884f99
Reviewed-on: http://gerrit.cloudera.org:8080/18528
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> CollectionItemsRead profile counter might be wrong in ORC scanner
> -----------------------------------------------------------------
>
>                 Key: IMPALA-11208
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11208
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.2.0, Impala 3.3.0, Impala 3.4.0, Impala 3.4.1
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>             Fix For: Impala 4.2.0
>
>
> I ran some TPCH(30) queries locally using orc/snap/block format. The profile 
> counter of CollectionItemsRead seems weird for me:
> {code:java}
> - CollectionItemsRead: -1679471728382351781 (-1679471728382351781) {code}
> It could also be super large positive values, e.g.
> {code:java}
> - CollectionItemsRead: 1296851974.72B (1296851974721369461) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to