[
https://issues.apache.org/jira/browse/KYLIN-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762640#comment-17762640
]
ASF subversion and git services commented on KYLIN-5693:
--------------------------------------------------------
Commit a60e825af1f65296757f71d06a4c1c718ddf40c9 in kylin's branch
refs/heads/kylin5 from Chenliang Lu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a60e825af1 ]
KYLIN-5693 Avoid parquet footer reads twice in vectorized reader
> Reduce the number of times Spark reads Parquet Footer to improve query
> performance
> ----------------------------------------------------------------------------------
>
> Key: KYLIN-5693
> URL: https://issues.apache.org/jira/browse/KYLIN-5693
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine
> Affects Versions: 5.0-beta
> Reporter: Yaguang Jia
> Assignee: Yaguang Jia
> Priority: Critical
> Fix For: 5.0.0
>
>
> h2. Dev Design
> Parquet footer metadata is now always read twice in vectorized parquet reader.
> When the NameNode is under high pressure, it will cost time to read twice.
> Actually we can avoid reading the footer twice by reading all row groups in
> advance and filter row groups according to filters that require push down (no
> need to read the footer metadata again the second time).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)