[ 
https://issues.apache.org/jira/browse/HUDI-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-152:
---------------------------------
    Description: 
There are 2 issues with the realtime input format:

 
 # Delta records (updates) might not have the entire row change log, in such an 
update, we need to be able to call preCombine of the HoodieRecordPayload 
implementation so that we merge existing data from parquet (full row change 
log) with the new column being updated.
 # In case there is some custom computation of columns in a custom 
implementation of the HoodieRecordPayload, that will be missed in the realtime 
input format right now. We need to honor that by calling preCombine.

 

Both of the above are use-cases for power users who implement their own custom 
record. Since this is not common, this is lower priority. 

  was:Delta records (updates) might not have the entire row change log, in such 
an update, we need to be able to call preCombine of the HoodieRecordPayload 
implementation so that we merge existing data from parquet (full row change 
log) with the new column being updated.


> Invoke preCombine in real time view by converting arrayWritable to Avro
> -----------------------------------------------------------------------
>
>                 Key: HUDI-152
>                 URL: https://issues.apache.org/jira/browse/HUDI-152
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Major
>              Labels: sev:critical, triaged, user-support-issues
>
> There are 2 issues with the realtime input format:
>  
>  # Delta records (updates) might not have the entire row change log, in such 
> an update, we need to be able to call preCombine of the HoodieRecordPayload 
> implementation so that we merge existing data from parquet (full row change 
> log) with the new column being updated.
>  # In case there is some custom computation of columns in a custom 
> implementation of the HoodieRecordPayload, that will be missed in the 
> realtime input format right now. We need to honor that by calling preCombine.
>  
> Both of the above are use-cases for power users who implement their own 
> custom record. Since this is not common, this is lower priority. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to