[
https://issues.apache.org/jira/browse/HUDI-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Brown updated HUDI-9641:
--------------------------------
Description:
As part of migrating towards our new record merging classes, we will no longer
require the HoodieRecords to be payload records. The existing places in the
code that create payload based records should be migrated to use an Avro based
record instead.
This new record should have the same performance as the payload based approach
when it comes to shuffling costs in Spark by avoiding the serialization of the
schema. We also want to avoid parsing the bytes to a record until it is finally
needed to avoid unnecessary overhead.
After this flow is working, the special handling of payloads in the
RecordContext should also be removed.
IndexUtils should also be updated to avoid payload usage:
https://github.com/apache/hudi/pull/13600#discussion_r2252753647
was:
As part of migrating towards our new record merging classes, we will no longer
require the HoodieRecords to be payload records. The existing places in the
code that create payload based records should be migrated to use an Avro based
record instead.
This new record should have the same performance as the payload based approach
when it comes to shuffling costs in Spark by avoiding the serialization of the
schema. We also want to avoid parsing the bytes to a record until it is finally
needed to avoid unnecessary overhead.
After this flow is working, the special handling of payloads in the
RecordContext should also be removed.
> Create HoodieAvroBinaryRecord and remove payload usage on writer path
> ---------------------------------------------------------------------
>
> Key: HUDI-9641
> URL: https://issues.apache.org/jira/browse/HUDI-9641
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Lin Liu
> Assignee: Timothy Brown
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> As part of migrating towards our new record merging classes, we will no
> longer require the HoodieRecords to be payload records. The existing places
> in the code that create payload based records should be migrated to use an
> Avro based record instead.
> This new record should have the same performance as the payload based
> approach when it comes to shuffling costs in Spark by avoiding the
> serialization of the schema. We also want to avoid parsing the bytes to a
> record until it is finally needed to avoid unnecessary overhead.
> After this flow is working, the special handling of payloads in the
> RecordContext should also be removed.
> IndexUtils should also be updated to avoid payload usage:
> https://github.com/apache/hudi/pull/13600#discussion_r2252753647
--
This message was sent by Atlassian Jira
(v8.20.10#820010)