[
https://issues.apache.org/jira/browse/HUDI-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-5236:
----------------------------------
Priority: Critical (was: Blocker)
> Introduce materialization into HoodieBackedTableMetadata
> --------------------------------------------------------
>
> Key: HUDI-5236
> URL: https://issues.apache.org/jira/browse/HUDI-5236
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Alexey Kudinkin
> Assignee: sivabalan narayanan
> Priority: Critical
> Fix For: 0.13.0
>
>
> *Problem Statement*
> Currently, MT performance is hardly predictable due to variety of factors
> such as, for ex,
> whether the MT is compacted: if table is NOT compacted, when loading "files"
> partition for ex, we will load all of the delta-log files materializing them
> in-memory, meaning that all subsequent requests will be served from memory.
> However, when table IS compacted, we will only prematerialize the updated
> records but not the records sitting in the base file, which would require us
> to go fetch from base HFile every time (even though there's block-level
> caching implemented inside HFile reader).
> More generally, `HoodieBackedTableMetadata` being the primary facade and
> interface for MT, currently doesn't have a well thought-through architecture
> and APIs, instead it serves simply as an aggregation layer for the
> lower-level components (LogRecordScanner, FileReader, etc).
> This is problematic, since MT is a core component performance of which has
> direct implication on the query planning and beyond. As such, it has to have:
> # {*}Predictable performance{*}: how state of MT affects performance should
> be easy to comprehend and reason about (for ex, {_}it's expected that
> performance could be decreasing, with increase in scale or if the table is
> not compacted for a long time; however it's totally unexpected that
> performance could become worse than it was after compaction{_})
> # {*}Have clear configuration levers{*}: behavior, performance of the MT
> should have crystal clear configuration levers – whether records are
> materialized in-memory or loaded dynamically,
>
> *Solution*
> To address aforementioned problems, we propose to implement
> HoodieBackedTableMetadataV2 providing
> * {*}Materialization{*}: it should allow MT to be read in either of 2 ways
> ** _Eagerly:_ when whole MT is loaded in-memory before accessing
> ** _Lazily:_ when MT is queried on an ad-hoc basis, however caching the
> results of the previous queries for subsequent use
> *
--
This message was sent by Atlassian Jira
(v8.20.10#820010)