voonhous commented on issue #17978:
URL: https://github.com/apache/hudi/issues/17978#issuecomment-3782486575
Upon further investigation, the error below:
```shell
class java.lang.Integer cannot be cast to class org.apache.avro.util.Utf8
(java.lang.Integer is in module java.base of loader 'bootstrap';
org.apache.avro.util.Utf8 is in unnamed module of loader 'app')
at org.apache.avro.util.Utf8.compareTo(Utf8.java:36)
```
Is caused by comparing an `Int` with a `Utf8` String as seen in the debugger
screenshot below:
<img width="1966" height="1195" alt="Image"
src="https://github.com/user-attachments/assets/f4fe61b9-400e-4778-b537-800f61c6b8e4"
/>
# Root Cause
The reason why the `orderingValue` has a value of 0 is because
`sourceOrderingFields` is empty. As such, the `orderingValue` will default to
0.
The reason why `sourceOrderingFields` is empty is because
`hoodie.table.ordering.fields` is empty in the `tableConfig`.
And the reason for the above config not propagating to propagating to
`tableConfig` is explained below:
The `--source-ordering-fields`/`--source-ordering-field` CLI parameter is
not propagated to `HoodieTableConfig` during bootstrap mode.
## 1. HoodieStreamer.java `combineProperties()` method:
- This method builds the TypedProperties that gets passed to
BootstrapExecutor
- It sets several configs explicitly (table type, record merge impl classes,
etc.)
https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java#L181-L209
## 2. BootstrapExecutor.java
`#setOrderingFields(ConfigUtils.getOrderingFieldsStrDuringWrite(props))`
- Tries to get ordering fields from properties
- Since `cfg.sourceOrderingFields` was never added to props, this returns
null
https://github.com/apache/hudi/blob/c88df86a012037a0e822740e2e4d4b7bb470b7cb/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/BootstrapExecutor.java#L204-L211
## 3. ConfigUtils.java `#getOrderingFieldsStrDuringWrite()`
- Only checks for `HoodieTableConfig.ORDERING_FIELDS` or
`hoodie.datasource.write.precombine.field` in properties
- Neither is set when using CLI `--source-ordering-fields`
https://github.com/apache/hudi/blob/c88df86a012037a0e822740e2e4d4b7bb470b7cb/hudi-common/src/main/java/org/apache/hudi/common/util/ConfigUtils.java#L111-L123
## 4. StreamSync.java (non-bootstrap flow)
On non-bootstrap flows:
- #setOrderingFields(cfg.sourceOrderingFields)`
- Directly uses cfg.sourceOrderingFields, which works correctly
https://github.com/apache/hudi/blob/c88df86a012037a0e822740e2e4d4b7bb470b7cb/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java#L457-L486
## TLDR:
Inconsistency of why normal flow but why bootstrap flow does not work:
1. Non-bootstrap (`StreamSync`)
- `cfg.sourceOrderingFields` directly and it works
2. Bootstrap (`BootstrapExecutor`)
- `ConfigUtils.getOrderingFieldsStrDuringWrite(props)`, which is missing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]