[ 
https://issues.apache.org/jira/browse/HUDI-9641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Brown updated HUDI-9641:
--------------------------------
    Description: 
As part of migrating towards our new record merging classes, we will no longer 
require the HoodieRecords to be payload records. The existing places in the 
code that create payload based records should be migrated to use an Avro based 
record instead. 

This new record should have the same performance as the payload based approach 
when it comes to shuffling costs in Spark by avoiding the serialization of the 
schema. We also want to avoid parsing the bytes to a record until it is finally 
needed to avoid unnecessary overhead.

After this flow is working, the special handling of payloads in the 
RecordContext should also be removed.
IndexUtils should also be updated to avoid payload usage: 
https://github.com/apache/hudi/pull/13600#discussion_r2252753647

  was:
As part of migrating towards our new record merging classes, we will no longer 
require the HoodieRecords to be payload records. The existing places in the 
code that create payload based records should be migrated to use an Avro based 
record instead. 

This new record should have the same performance as the payload based approach 
when it comes to shuffling costs in Spark by avoiding the serialization of the 
schema. We also want to avoid parsing the bytes to a record until it is finally 
needed to avoid unnecessary overhead.

After this flow is working, the special handling of payloads in the 
RecordContext should also be removed.


> Create HoodieAvroBinaryRecord and remove payload usage on writer path
> ---------------------------------------------------------------------
>
>                 Key: HUDI-9641
>                 URL: https://issues.apache.org/jira/browse/HUDI-9641
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Lin Liu
>            Assignee: Timothy Brown
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> As part of migrating towards our new record merging classes, we will no 
> longer require the HoodieRecords to be payload records. The existing places 
> in the code that create payload based records should be migrated to use an 
> Avro based record instead. 
> This new record should have the same performance as the payload based 
> approach when it comes to shuffling costs in Spark by avoiding the 
> serialization of the schema. We also want to avoid parsing the bytes to a 
> record until it is finally needed to avoid unnecessary overhead.
> After this flow is working, the special handling of payloads in the 
> RecordContext should also be removed.
> IndexUtils should also be updated to avoid payload usage: 
> https://github.com/apache/hudi/pull/13600#discussion_r2252753647



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to