[ 
https://issues.apache.org/jira/browse/HUDI-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7585:
----------------------------
    Description: The table schema resolver needs to read schema from the data 
files (base or log files) to see whether _hoodie_operation field is present for 
Flink CDC use cases.  This can cause overhead of reading data file footers 
multiple times.  We should see if we can store a table config to indicate if or 
simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation 
field and schema resolver).  (was: The table schema resolver needs to read 
schema from the data files (base or log files) to see whether _hoodie_operation 
field is present for Flink CDC use cases.  This can cause overhead of reading 
data file footers multiple times.  We should see if we can store or simplify 
the Flink CDC format in Hudi 1.0 (thus no need of ).)

> Avoid reading log files for resolving schema for _hoodie_operation field
> ------------------------------------------------------------------------
>
>                 Key: HUDI-7585
>                 URL: https://issues.apache.org/jira/browse/HUDI-7585
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Ethan Guo
>            Assignee: Jing Zhang
>            Priority: Major
>             Fix For: 1.0.0
>
>
> The table schema resolver needs to read schema from the data files (base or 
> log files) to see whether _hoodie_operation field is present for Flink CDC 
> use cases.  This can cause overhead of reading data file footers multiple 
> times.  We should see if we can store a table config to indicate if or 
> simplify the Flink CDC format in Hudi 1.0 (thus no need of _hoodie_operation 
> field and schema resolver).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to