[ 
https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Kudinkin updated HUDI-3217:
----------------------------------
      Epic Link: HUDI-3081
    Description: 
Currently Hudi is biased t/w assumption of particular payload representation 
(Avro), long-term we would like to steer away from this to keep the record 
payload be completely opaque, so that
 # We can keep record payload representation engine-specific
 # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > 
Binary)

h2. *Proposal*
 
*Phase 2: Revisiting Record Handling*
{_}T-shirt{_}: 2-2.5 weeks
{_}Goal{_}: Avoid tight coupling with particular record representation on the 
Read Path (currently Avro) and enable
  * Revisit RecordPayload APIs
 ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs 
replacing w/ new “opaque” APIs (not returning Avro payloads)
 ** Rebase RecordPayload hierarchy to be engine-specific:
 *** Common engine-specific base abstracting common functionality (Spark, 
Flink, Java)
 *** Each feature-specific semantic will have to implement for all engines
 ** Introduce new APIs
 *** To access keys (record, partition)
 *** To convert record to Avro (for BWC)
 * Revisit RecordPayload handling
 ** In WriteHandles 
 *** API will be accepting opaque RecordPayload (no Avro conversion)
 *** Can do (opaque) record merging if necessary
 *** Passes RP as is to FileWriter
 ** In FileWriters
 *** Will accept RecordPayload interface
 *** Should be engine-specific (to handle internal record representation
 ** In RecordReaders
 *** API will be providing opaque RecordPayload (no Avro conversion)

 

REF

[https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680]

 

  was:
This is Phase 2 of what outlined in HUDI-3081

The goals are
 * Avoid tight coupling with particular record representation on the Read Path 
(currently Avro) and enable

 * 
 ** Common record handling API for combining records (Merge API)

 * 
 ** Avoiding unnecessary serde by abstracting away standardized Record access 
routines (getting key, merging, etc)
 *** Behind the interface we'd rely on engine-specific representation to carry 
the payload (`InternalRow` for Spark, `ArrayWritable` for Hive, etc)

     Issue Type: Epic  (was: Improvement)
        Summary: Revisit Record Payload handling  (was: [Phase 2] Revisit 
Record Payload handling)

> Revisit Record Payload handling
> -------------------------------
>
>                 Key: HUDI-3217
>                 URL: https://issues.apache.org/jira/browse/HUDI-3217
>             Project: Apache Hudi
>          Issue Type: Epic
>            Reporter: Alexey Kudinkin
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Currently Hudi is biased t/w assumption of particular payload representation 
> (Avro), long-term we would like to steer away from this to keep the record 
> payload be completely opaque, so that
>  # We can keep record payload representation engine-specific
>  # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > 
> Binary)
> h2. *Proposal*
>  
> *Phase 2: Revisiting Record Handling*
> {_}T-shirt{_}: 2-2.5 weeks
> {_}Goal{_}: Avoid tight coupling with particular record representation on the 
> Read Path (currently Avro) and enable
>   * Revisit RecordPayload APIs
>  ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs 
> replacing w/ new “opaque” APIs (not returning Avro payloads)
>  ** Rebase RecordPayload hierarchy to be engine-specific:
>  *** Common engine-specific base abstracting common functionality (Spark, 
> Flink, Java)
>  *** Each feature-specific semantic will have to implement for all engines
>  ** Introduce new APIs
>  *** To access keys (record, partition)
>  *** To convert record to Avro (for BWC)
>  * Revisit RecordPayload handling
>  ** In WriteHandles 
>  *** API will be accepting opaque RecordPayload (no Avro conversion)
>  *** Can do (opaque) record merging if necessary
>  *** Passes RP as is to FileWriter
>  ** In FileWriters
>  *** Will accept RecordPayload interface
>  *** Should be engine-specific (to handle internal record representation
>  ** In RecordReaders
>  *** API will be providing opaque RecordPayload (no Avro conversion)
>  
> REF
> [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to