baunz opened a new issue, #10319:
URL: https://github.com/apache/hudi/issues/10319

   **Describe the problem you faced**
   
   We are bootstrapping a MOR table with a spark job using bulkinsert, and 
periodically upsert data afterwards with HoodieStreamer. 
   
   Currently, it is not clear to me which properties can be reused by using the 
same properties file and which have to be specified explicitly. It seems that 
all CLI options from HoodieStreamer need to be set, or otherwise the target 
table properties are overriden by the streamers default properties. Example 
following up:
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   1. Write to table with a spark job with the following config
   
   ```
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.DefaultHoodieRecordPayload
   hoodie.payload.ordering.field=LAST_UPDATE
   ```
   => hoodie.properties contains this value
   
   2. Run Deltastreamer with the same config values passed as props file
   
   => hoodie.properties contains
   
   ```
   
hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
   ```
   
   as it is the default value for the [cli 
argument](https://github.com/apache/hudi/blob/17b62a2c0f47f86b436330f2b0ea109b8c8f743c/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java#L259)
 
   
   **Expected behavior**
   
   Maybe a way to not have to specify properties twice (essentially all 
Streamer args) to reduce error probability, if i go the main cause of the 
problem correctly
   
   **Environment Description**
   
   * Hudi version :
   0.14.0
   EMR Serverless 6.15.0
   S3
   
   * Running on Docker? (yes/no) :
   no
   **Stacktrace**
   
   ```
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Config 
conflict(key    current value   existing value):
   hoodie.compaction.payload.class:     
org.apache.hudi.common.model.DefaultHoodieRecordPayload 
org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
        at 
org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:211)
        at 
org.apache.hudi.HoodieWriterUtils$.validateTableConfig(HoodieWriterUtils.scala:158)
        at 
org.apache.hudi.HoodieWriterUtils.validateTableConfig(HoodieWriterUtils.scala)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.<init>(HoodieStreamer.java:683)
        at 
org.apache.hudi.utilities.streamer.HoodieStreamer.<init>(HoodieStreamer.java:159)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to