[
https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-7585:
----------------------------
Description: The table schema resolver needs to read schema from the data
files (base or log files) to see whether _hoodie_operation field is present for
Flink CDC use cases. This can cause overhead of reading data file footers
multiple times. We should see if we can store a table config to indicate if or
simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation
field and schema resolver). (was: The table schema resolver needs to read
schema from the data files (base or log files) to see whether _hoodie_operation
field is present for Flink CDC use cases. This can cause overhead of reading
data file footers multiple times. We should see if we can store or simplify
the Flink CDC format in Hudi 1.0 (thus no need of ).)
> Avoid reading log files for resolving schema for _hoodie_operation field
> ------------------------------------------------------------------------
>
> Key: HUDI-7585
> URL: https://issues.apache.org/jira/browse/HUDI-7585
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Ethan Guo
> Assignee: Jing Zhang
> Priority: Major
> Fix For: 1.0.0
>
>
> The table schema resolver needs to read schema from the data files (base or
> log files) to see whether _hoodie_operation field is present for Flink CDC
> use cases. This can cause overhead of reading data file footers multiple
> times. We should see if we can store a table config to indicate if or
> simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation
> field and schema resolver).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)