[
https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-3217:
----------------------------------
Epic Link: HUDI-3081
Description:
Currently Hudi is biased t/w assumption of particular payload representation
(Avro), long-term we would like to steer away from this to keep the record
payload be completely opaque, so that
# We can keep record payload representation engine-specific
# Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific >
Binary)
h2. *Proposal*
*Phase 2: Revisiting Record Handling*
{_}T-shirt{_}: 2-2.5 weeks
{_}Goal{_}: Avoid tight coupling with particular record representation on the
Read Path (currently Avro) and enable
* Revisit RecordPayload APIs
** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs
replacing w/ new “opaque” APIs (not returning Avro payloads)
** Rebase RecordPayload hierarchy to be engine-specific:
*** Common engine-specific base abstracting common functionality (Spark,
Flink, Java)
*** Each feature-specific semantic will have to implement for all engines
** Introduce new APIs
*** To access keys (record, partition)
*** To convert record to Avro (for BWC)
* Revisit RecordPayload handling
** In WriteHandles
*** API will be accepting opaque RecordPayload (no Avro conversion)
*** Can do (opaque) record merging if necessary
*** Passes RP as is to FileWriter
** In FileWriters
*** Will accept RecordPayload interface
*** Should be engine-specific (to handle internal record representation
** In RecordReaders
*** API will be providing opaque RecordPayload (no Avro conversion)
REF
[https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680]
was:
This is Phase 2 of what outlined in HUDI-3081
The goals are
* Avoid tight coupling with particular record representation on the Read Path
(currently Avro) and enable
*
** Common record handling API for combining records (Merge API)
*
** Avoiding unnecessary serde by abstracting away standardized Record access
routines (getting key, merging, etc)
*** Behind the interface we'd rely on engine-specific representation to carry
the payload (`InternalRow` for Spark, `ArrayWritable` for Hive, etc)
Issue Type: Epic (was: Improvement)
Summary: Revisit Record Payload handling (was: [Phase 2] Revisit
Record Payload handling)
> Revisit Record Payload handling
> -------------------------------
>
> Key: HUDI-3217
> URL: https://issues.apache.org/jira/browse/HUDI-3217
> Project: Apache Hudi
> Issue Type: Epic
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Fix For: 0.11.0
>
>
> Currently Hudi is biased t/w assumption of particular payload representation
> (Avro), long-term we would like to steer away from this to keep the record
> payload be completely opaque, so that
> # We can keep record payload representation engine-specific
> # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific >
> Binary)
> h2. *Proposal*
>
> *Phase 2: Revisiting Record Handling*
> {_}T-shirt{_}: 2-2.5 weeks
> {_}Goal{_}: Avoid tight coupling with particular record representation on the
> Read Path (currently Avro) and enable
> * Revisit RecordPayload APIs
> ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs
> replacing w/ new “opaque” APIs (not returning Avro payloads)
> ** Rebase RecordPayload hierarchy to be engine-specific:
> *** Common engine-specific base abstracting common functionality (Spark,
> Flink, Java)
> *** Each feature-specific semantic will have to implement for all engines
> ** Introduce new APIs
> *** To access keys (record, partition)
> *** To convert record to Avro (for BWC)
> * Revisit RecordPayload handling
> ** In WriteHandles
> *** API will be accepting opaque RecordPayload (no Avro conversion)
> *** Can do (opaque) record merging if necessary
> *** Passes RP as is to FileWriter
> ** In FileWriters
> *** Will accept RecordPayload interface
> *** Should be engine-specific (to handle internal record representation
> ** In RecordReaders
> *** API will be providing opaque RecordPayload (no Avro conversion)
>
> REF
> [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680]
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)