[
https://issues.apache.org/jira/browse/HUDI-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341212#comment-17341212
]
pengzhiwei edited comment on HUDI-1879 at 5/8/21, 11:12 AM:
------------------------------------------------------------
Hi [~uditme] [~vinoth], I have submit a PR 2925 to solve issue 1 and submit PR
2925 to solve issue 2.
was (Author: pzw2018):
Hi [~uditme] [~vinoth], I have submit a PR 2925 to solve issue 1.
For issue 2, I will make another PR to solve this.
> Spark DataSource tables/HoodieFileIndex issues for Merge On Read
> ----------------------------------------------------------------
>
> Key: HUDI-1879
> URL: https://issues.apache.org/jira/browse/HUDI-1879
> Project: Apache Hudi
> Issue Type: Bug
> Components: Spark Integration
> Reporter: Udit Mehrotra
> Assignee: pengzhiwei
> Priority: Blocker
> Labels: pull-request-available, sev:critical
> Fix For: 0.9.0
>
>
> *Read as DataSource Tables* and *HoodieFileIndex* implementation that went inĀ
> [https://github.com/apache/hudi/pull/2283] and
> [https://github.com/apache/hudi/pull/2651] has introduced a couple of major
> regressions for *Merge on Read* tables:
> * *_ro* *tables returning Snapshot results*: Since we are directly using
> Hudi DataSource now to query *_ro* and *_rt* MOR tables, the DataSource has
> no way to recognize the difference between read optimized and real time
> tables as it has no way to check for *table name*. In both these scenarios
> *{color:#172b4d}QUERY_TYPE_OPT_KEY{color}*{color:#172b4d} turns out to be
> *snapshot* by default, which is causing *MergeOnReadSnapshotRelation* to be
> used for querying thus returning snapshot results always.{color}
> * *{color:#172b4d}Partition pruning{color}* *{color:#172b4d}does not
> work{color}* *{color:#172b4d}for realtime queries{color}*{color:#172b4d}: The
> *MergeOnReadSnapshotRelation* is directly using *allFiles* to always fetch
> all the files without doing any partition pruning. This is a regression for
> Spark SQL real time queries because earlier partition pruning would work via
> InputFormat for these queries. Thus, it will have impact on rt queries
> performance.{color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)