[
https://issues.apache.org/jira/browse/HUDI-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen updated HUDI-9318:
-----------------------------
Description:
The records cache in FileGroupRecordBuffer is map, the map key is the record
key, the map value is a pair:
{code:java}
Pair<Option<T>, Map<String, Object>>{code}
the pair left is the actual option of HoodieRecord, an empty options represents
a delete;
the pair right is the record metadata, such as record key, orderingValue, and
avro schema.
Let's revise the metadata part:
# add a new serializable Java class MergingItem to holds the hoodie record;
# the record key and orderingValue should be fetched through HoodieRecord
APIs, we need to check all the hoodie record types to ensure this is efficient,
if not try to fix it as efficient, and if it is hard to do that for all kinds
of record type, let's put a duplicate in the MergingItem class;
# does not emptify the delete record as an Option.empty, set up the `isDelete`
correctly in HoodieRecord itself or put a duplicate in MergingItem class,
the delete record can be used to generate retraction msg in streaming scenarios;
# add the local schema id in the MergingItem class;
# drop out the metadata map which is very confusing
was:
The records cache in FileGroupRecordBuffer is map, the map key is the record
key, the map value is a pair:
{code:java}
Pair<Option<T>, Map<String, Object>>{code}
the pair left is the actual option of HoodieRecord, an empty options represents
a delete;
the pair right is the record metadata, such as record key, orderingValue, and
avro schema.
Let's revise the metadata part:
1. add a new serializable Java class MergingItem to holds the hoodie record;
2. the record key and orderingValue should be fetched through HoodieRecord
APIs, we need to check all the hoodie record types to ensure this is efficient,
if not try to fix it as efficient, and if it is hard to do that for all kinds
of record type, let's put a duplicate in the MergingItem class;
3. does not emptify the delete record as an Option.empty, set up the `isDelete`
correctly in HoodieRecord itself or put a duplicate in MergingItem class;
4. add the local schema id in the MergingItem class;
5. drop out the metadata map which is very confusing
> Refactor the log records presentation in FileGroupRecordBuffer
> --------------------------------------------------------------
>
> Key: HUDI-9318
> URL: https://issues.apache.org/jira/browse/HUDI-9318
> Project: Apache Hudi
> Issue Type: Improvement
> Components: core
> Reporter: Danny Chen
> Assignee: Timothy Brown
> Priority: Major
> Fix For: 1.1.0
>
>
> The records cache in FileGroupRecordBuffer is map, the map key is the record
> key, the map value is a pair:
> {code:java}
> Pair<Option<T>, Map<String, Object>>{code}
> the pair left is the actual option of HoodieRecord, an empty options
> represents a delete;
> the pair right is the record metadata, such as record key, orderingValue, and
> avro schema.
> Let's revise the metadata part:
> # add a new serializable Java class MergingItem to holds the hoodie record;
> # the record key and orderingValue should be fetched through HoodieRecord
> APIs, we need to check all the hoodie record types to ensure this is
> efficient, if not try to fix it as efficient, and if it is hard to do that
> for all kinds of record type, let's put a duplicate in the MergingItem class;
> # does not emptify the delete record as an Option.empty, set up the
> `isDelete` correctly in HoodieRecord itself or put a duplicate in MergingItem
> class,
> the delete record can be used to generate retraction msg in streaming
> scenarios;
> # add the local schema id in the MergingItem class;
> # drop out the metadata map which is very confusing
--
This message was sent by Atlassian Jira
(v8.20.10#820010)