hudi-bot opened a new issue, #15445:
URL: https://github.com/apache/hudi/issues/15445

   This feature aims to improve PartialUpdatePayload to handle multiple sources 
properly
   Let's give you some background about why we need multiple ordering fields
   For example, we have 2 sources,  one target table
   * source1's fields: *id, ts, name*
   * source2's fields:*id, ts, price*
   * target tables's fields:*id,ts,name, price*
   
   ts is the precombine field;
   
   
   in the 1st batch, we got two records from both sources:
       Source1:
        
   ||id||ts||name||
   |1|1|name_1|
       Source 2:
       
   ||id||ts||price||
   |1|3|price_3|
    so the records in the target table should be:
   ||id||ts||name||price||
   |1|3|name_1|price_3|
    
    let's say in the 2nd batch, we got one event from the source1:
    Source1:
        ||id||ts||name||
   |1|2|name_2|
   
   but name_2 won't be updated to the target table, since its ts value is 
smaller than the ts value in the target table.
   
   This feature will allow users to perform partial updates across 
sub-tables/sources by determining the state of a set of columns in a row based 
on an ordering/precombine column.
   
   As such, a table can have MULTIPLE ordering fields.
   
   This use case is suitable for wide Hudi tables that are created from smaller 
sub-tables, where each of its sub-tables has its own precombine column, and 
where its records could be upserted out of order.
    !image-2022-09-20-22-46-52-907.png! 
   
   
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-4882
   - Type: New Feature
   - Attachment(s):
     - 20/Sep/22 
14:42;fengjian_428;image-2022-09-20-22-42-19-445.png;https://issues.apache.org/jira/secure/attachment/13049524/image-2022-09-20-22-42-19-445.png
     - 20/Sep/22 
14:46;fengjian_428;image-2022-09-20-22-46-52-907.png;https://issues.apache.org/jira/secure/attachment/13049523/image-2022-09-20-22-46-52-907.png


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to