ad1happy2go commented on issue #10138:
URL: https://github.com/apache/hudi/issues/10138#issuecomment-1820968721
@abhisheksahani91 I somehow tried a lot to reproduce the issue in my local
setup with 0.12.1 Hudi version but unable to reproduce. Can you try to
reproduce once like below -
```
# Step 1. Start Delta Streamer. Use schema file provided with Hudi Docker
demo - "schema.avsc"
${SPARK_DIR}/bin/spark-submit \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
--master local[*] \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
${WORK_DIR}/artifacts/jars/0.12.1/3.2/hudi-utilities-bundle_2.12-0.12.1.jar \
--schemaprovider-class
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field ts \
--target-base-path file:///tmp/issue_10138 \
--target-table issue_10138\
--table-type MERGE_ON_READ \
--hoodie-conf
"hoodie.deltastreamer.schemaprovider.source.schema.file=${WORK_DIR}/conf/schema.avsc"
\
--hoodie-conf
"hoodie.deltastreamer.schemaprovider.target.schema.file=${WORK_DIR}/conf/schema.avsc"
\
--hoodie-conf "hoodie.deltastreamer.source.kafka.topic=stock_ticks" \
--hoodie-conf "hoodie.datasource.write.recordkey.field=key" \
--hoodie-conf "hoodie.datasource.write.precombine.field=ts" \
--hoodie-conf "hoodie.datasource.write.operation=UPSERT" \
--hoodie-conf
"hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator"
\
--hoodie-conf "auto.offset.reset=latest" \
--hoodie-conf "bootstrap.servers=localhost:9092" \
--hoodie-conf "group.id=native-hudi-job" \
--hoodie-conf "hoodie.kafka.allow.commit.on.errors=true" \
--hoodie-conf "hoodie.write.allow_null_updates" \
--hoodie-conf "hoodie.index.type=SIMPLE" \
--hoodie-conf "hoodie.upsert.shuffle.parallelism=200" \
--hoodie-conf "hoodie.finalize.write.parallelism=400" \
--hoodie-conf "hoodie.markers.delete.parallelism=200" \
--hoodie-conf "hoodie.file.listing.parallelism=400" \
--hoodie-conf "hoodie.cleaner.parallelism=400" \
--hoodie-conf "hoodie.archive.delete.parallelism=200" \
--hoodie-conf "compaction.trigger.strategy=NUM_OR_TIME" \
--hoodie-conf "hoodie.compact.inline.trigger.strategy=NUM_OR_TIME" \
--hoodie-conf "compaction.schedule.enabled=true" \
--hoodie-conf "compaction.async.enabled=true" \
--hoodie-conf "compaction.delta_commits=5" \
--hoodie-conf "hoodie.compact.inline.max.delta.commits=5" \
--hoodie-conf "compaction.delta_seconds=600" \
--hoodie-conf "hoodie.compact.inline.max.delta.seconds=600" \
--hoodie-conf "hoodie.deltastreamer.kafka.commit_on_errors=true" \
--continuous
# Step 2 - Generated data multiple times , Used data used in docker demo -
# cat docker/demo/data/batch_1.json | head -10 | kcat -b kafkabroker -t
stock_ticks -P
# Step 3 - Add a optional field in schema file and produced records without
new col and monitor loads until compaction happens.
# Step 4 - Produced some records with that newCOl and monitor loads. Runs
fine after compaction also and data also looking correct.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]