deepakpanda93 commented on issue #17506:
URL: https://github.com/apache/hudi/issues/17506#issuecomment-3627300782
Hello @bithw1
For Spark SQL tables like yours, Hudi will use the **default payload /
record-merger configuration** if you don’t explicitly set one in
`TBLPROPERTIES` or writer options.
- The default Spark payload class is
`org.apache.hudi.common.model.OverwriteWithLatestAvroPayload` when
`hoodie.datasource.write.payload.class` is not set.
- This payload “picks the record with the greatest value (determined by
calling `.compareTo()` on the value of precombine key) to break ties and simply
picks the latest record while merging” – i.e. **latest-write-wins based on your
`preCombineField`** (`c` in your table).
So for your table:
```sql
CREATE TABLE IF NOT EXISTS hudi_cow_15 (
a INT,
b INT,
c INT
) USING hudi
TBLPROPERTIES(
type='cow',
primaryKey='a',
preCombineField='c'
)
```
- **Record key**: `a`
- **Ordering / preCombine field**: `c`
- **Combine logic** (by default): among multiple records with the same `a`,
the one with the **largest `c`** wins; that full record overwrites the previous
one.
Using OverwriteWithLatestAvroPayload, Hudi performs:
```
if incomingRecord.preCombineField > existingRecord.preCombineField:
keep incoming record
else:
keep existing record
```
For your schema:
```
record with highest value of c overwrites others
```
If the user does not set it, Hudi does not write it into .hoodie.properties,
Hudi automatically falls back to its default payload class
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]