[jira] [Commented] (KYLIN-5693) Reduce the number of times Spark reads Parquet Footer to improve query performance

ASF subversion and git services (Jira) Thu, 07 Sep 2023 01:48:12 -0700


    [ 
https://issues.apache.org/jira/browse/KYLIN-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762640#comment-17762640
 ]


ASF subversion and git services commented on KYLIN-5693:
--------------------------------------------------------

Commit a60e825af1f65296757f71d06a4c1c718ddf40c9 in kylin's branch 
refs/heads/kylin5 from Chenliang Lu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a60e825af1 ]

KYLIN-5693 Avoid parquet footer reads twice in vectorized reader


> Reduce the number of times Spark reads Parquet Footer to improve query 
> performance
> ----------------------------------------------------------------------------------
>
>                 Key: KYLIN-5693
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5693
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: 5.0-beta
>            Reporter: Yaguang Jia
>            Assignee: Yaguang Jia
>            Priority: Critical
>             Fix For: 5.0.0
>
>
> h2. Dev Design
> Parquet footer metadata is now always read twice in vectorized parquet reader.
> When the NameNode is under high pressure, it will cost time to read twice. 
> Actually we can avoid reading the footer twice by reading all row groups in 
> advance and filter row groups according to filters that require push down (no 
> need to read the footer metadata again the second time).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KYLIN-5693) Reduce the number of times Spark reads Parquet Footer to improve query performance

Reply via email to