voonhous commented on PR #8579:
URL: https://github.com/apache/hudi/pull/8579#issuecomment-1523138078

   > > Results will be different if combineAndUpdateValue is invoked in order 
without invoking preCombine.
   > 
   > What is exactly lost here?
   
   
   # PreCombine + combineAndGetUpdateValue
   
   ```
   Table schema: {id: int , name: string, price: double, _ts: int}
   recordKey: id
   precombineField: _ts
   
   Table initial state:
   [1    a1_0    10.0    1001]
   
   Table performs an update with an incoming batch that has the following 
results (2):
   (precombine + combineAndGetUpdateValue)
   [
     [1    a1_0    11.0     999],
     [1    a1_0    null     1001]
   ]
   
   End state of the table:
   [1    a1_0    11.0    1001]
   
   ```
   
   This is so as the incoming batch at (2) will be `preCombine` before 
performing a `combineAndUpdateValue`.
   
   # combineAndGetUpdateValue ONLY
   
   Results will be different if `combineAndUpdateValue` is invoked in order 
without invoking `preCombine`.
   
   Example:
   ```
   Table initial state:
   [1    a1_0    10.0    1001]
   
   Table performs an update: 
   (combineAndGetUpdateValue)
   [1    a1_0    11.0     999]
   
   Table performs an update again: 
   (combineAndGetUpdateValue)
   [1    a1_0   null     1001]
   
   End state of the table:
   [1    a1_0   10.0    1001]
   ```
   
   [
     [1    a1_0    11.0     999],
     [1    a1_0    null     1001]
   ]
   
   End state of the table:
   [1    a1_0    11.0    1001]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to