voonhous commented on issue #17978:
URL: https://github.com/apache/hudi/issues/17978#issuecomment-3782486575

   Upon further investigation, the error below:
   
   ```shell
   class java.lang.Integer cannot be cast to class org.apache.avro.util.Utf8 
(java.lang.Integer is in module java.base of loader 'bootstrap'; 
org.apache.avro.util.Utf8 is in unnamed module of loader 'app')
        at org.apache.avro.util.Utf8.compareTo(Utf8.java:36)
   ```
   
   Is caused by comparing an `Int` with a `Utf8` String as seen in the debugger 
screenshot below:
   
   <img width="1966" height="1195" alt="Image" 
src="https://github.com/user-attachments/assets/f4fe61b9-400e-4778-b537-800f61c6b8e4";
 />
   
   
   # Root Cause
   The reason why the `orderingValue` has a value of 0 is because 
`sourceOrderingFields` is empty. As such, the `orderingValue` will default to 
0. 
   
   The reason why `sourceOrderingFields` is empty is because 
`hoodie.table.ordering.fields` is empty in the `tableConfig`. 
   
   And the reason for the above config not propagating to propagating to 
`tableConfig` is explained below:
   
   
   The `--source-ordering-fields`/`--source-ordering-field` CLI parameter is 
not propagated to `HoodieTableConfig` during bootstrap mode.
   
   
   ## 1. HoodieStreamer.java `combineProperties()` method:
   - This method builds the TypedProperties that gets passed to 
BootstrapExecutor
   - It sets several configs explicitly (table type, record merge impl classes, 
etc.)
   
   
https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java#L181-L209
   
   
   ## 2. BootstrapExecutor.java
   `#setOrderingFields(ConfigUtils.getOrderingFieldsStrDuringWrite(props))`
   - Tries to get ordering fields from properties
   - Since `cfg.sourceOrderingFields` was never added to props, this returns 
null
   
   
https://github.com/apache/hudi/blob/c88df86a012037a0e822740e2e4d4b7bb470b7cb/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/BootstrapExecutor.java#L204-L211
   
   ## 3. ConfigUtils.java `#getOrderingFieldsStrDuringWrite()`
   - Only checks for `HoodieTableConfig.ORDERING_FIELDS` or 
`hoodie.datasource.write.precombine.field` in properties
   - Neither is set when using CLI `--source-ordering-fields`
   
   
https://github.com/apache/hudi/blob/c88df86a012037a0e822740e2e4d4b7bb470b7cb/hudi-common/src/main/java/org/apache/hudi/common/util/ConfigUtils.java#L111-L123
   
   ## 4. StreamSync.java (non-bootstrap flow)
   
   On non-bootstrap flows:
   - #setOrderingFields(cfg.sourceOrderingFields)`
   - Directly uses cfg.sourceOrderingFields, which works correctly
   
   
https://github.com/apache/hudi/blob/c88df86a012037a0e822740e2e4d4b7bb470b7cb/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java#L457-L486
   
   
   ## TLDR:
   Inconsistency of why normal flow but why bootstrap flow does not work:
   1. Non-bootstrap (`StreamSync`) 
       - `cfg.sourceOrderingFields` directly and it works
   2. Bootstrap (`BootstrapExecutor`)
       - `ConfigUtils.getOrderingFieldsStrDuringWrite(props)`, which is missing
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to