ad1happy2go commented on issue #10609:
URL: https://github.com/apache/hudi/issues/10609#issuecomment-1942095222

   @maheshguptags I tried to reproduce the issue but couldn't do it. Following 
are the artefacts.
   
   Kafka-source.props
   ```
   hoodie.datasource.write.recordkey.field=volume
   hoodie.datasource.write.partitionpath.field=year
   hoodie.datasource.write.precombine.field=ts
   hoodie.clean.max.commits=6
   hoodie.clean.trigger.strategy=NUM_COMMITS
   hoodie.cleaner.commits.retained=4
   hoodie.cleaner.parallelism=50
   hoodie.clean.automatic=true
   hoodie.clean.async=true
   hoodie.parquet.compression.codec=snappy
   hoodie.index.type=RECORD_INDEX
   hoodie.metadata.record.index.enable=true
   hoodie.metadata.record.index.min.filegroup.count=20
   hoodie.metadata.record.index.max.filegroup.count=5000
   hoodie.datasource.write.new.columns.nullable=true
   hoodie.datasource.write.reconcile.schema=true
   bootstrap.servers=localhost:9092
   auto.offset.reset=latest
   ```
   
   Command - 
   ```
   ${SPARK_HOME}/bin/spark-submit --name customer-event-hudideltaStream \
   --jars ${HOME_DIR}/jars/0.14.1/spark32/hudi-spark3.4-bundle_2.12-0.14.1.jar \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
   ${HOME_DIR}/jars/0.14.1/spark32/hudi-utilities-slim-bundle_2.12-0.14.1.jar \
   --checkpoint file:///tmp/hudistreamer/test/checkpoint1 \
   --target-base-path file:///tmp/hudistreamer/test/output1 \
   --target-table customer_profile --table-type COPY_ON_WRITE \
   --base-file-format PARQUET \
   --props kafka-source.props \
   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource 
--source-ordering-field ts \
   --payload-class org.apache.hudi.common.model.DefaultHoodieRecordPayload \
   --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
   --hoodie-conf 
hoodie.streamer.schemaprovider.source.schema.file=${HOME_DIR}/docker_demo/conf/schema.avsc
 \
   --hoodie-conf 
hoodie.streamer.schemaprovider.target.schema.file=${HOME_DIR}/docker_demo/conf/schema.avsc
 \
   --op UPSERT --hoodie-conf hoodie.streamer.source.kafka.topic=stock_ticks \
   --hoodie-conf hoodie.datasource.write.partitionpath.field=year \
   --continuous
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to