Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

via GitHub Tue, 21 Nov 2023 05:55:55 -0800


ad1happy2go commented on issue #10138:
URL: https://github.com/apache/hudi/issues/10138#issuecomment-1820968721


   @abhisheksahani91 I somehow tried a lot to reproduce the issue in my local 
setup with 0.12.1 Hudi version but unable to reproduce. Can you try to 
reproduce once like below - 
   
   ```
   # Step 1. Start Delta Streamer. Use schema file provided with Hudi Docker 
demo -  "schema.avsc"
   ${SPARK_DIR}/bin/spark-submit \
     --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
     --master local[*] \
     --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
     --conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
 \
     --conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
     
${WORK_DIR}/artifacts/jars/0.12.1/3.2/hudi-utilities-bundle_2.12-0.12.1.jar \
     --schemaprovider-class 
org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
     --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
     --source-ordering-field ts \
     --target-base-path file:///tmp/issue_10138 \
     --target-table issue_10138\
     --table-type MERGE_ON_READ \
     --hoodie-conf 
"hoodie.deltastreamer.schemaprovider.source.schema.file=${WORK_DIR}/conf/schema.avsc"
 \
     --hoodie-conf 
"hoodie.deltastreamer.schemaprovider.target.schema.file=${WORK_DIR}/conf/schema.avsc"
 \
     --hoodie-conf "hoodie.deltastreamer.source.kafka.topic=stock_ticks" \
     --hoodie-conf "hoodie.datasource.write.recordkey.field=key" \
     --hoodie-conf "hoodie.datasource.write.precombine.field=ts" \
     --hoodie-conf "hoodie.datasource.write.operation=UPSERT" \
     --hoodie-conf 
"hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator"
 \
     --hoodie-conf "auto.offset.reset=latest" \
     --hoodie-conf "bootstrap.servers=localhost:9092" \
     --hoodie-conf "group.id=native-hudi-job" \
     --hoodie-conf "hoodie.kafka.allow.commit.on.errors=true" \
     --hoodie-conf "hoodie.write.allow_null_updates" \
     --hoodie-conf "hoodie.index.type=SIMPLE" \
     --hoodie-conf "hoodie.upsert.shuffle.parallelism=200" \
     --hoodie-conf "hoodie.finalize.write.parallelism=400" \
     --hoodie-conf "hoodie.markers.delete.parallelism=200" \
     --hoodie-conf "hoodie.file.listing.parallelism=400" \
     --hoodie-conf "hoodie.cleaner.parallelism=400" \
     --hoodie-conf "hoodie.archive.delete.parallelism=200" \
     --hoodie-conf "compaction.trigger.strategy=NUM_OR_TIME" \
     --hoodie-conf "hoodie.compact.inline.trigger.strategy=NUM_OR_TIME" \
     --hoodie-conf "compaction.schedule.enabled=true" \
     --hoodie-conf "compaction.async.enabled=true" \
     --hoodie-conf "compaction.delta_commits=5" \
     --hoodie-conf "hoodie.compact.inline.max.delta.commits=5" \
     --hoodie-conf "compaction.delta_seconds=600" \
     --hoodie-conf "hoodie.compact.inline.max.delta.seconds=600" \
     --hoodie-conf "hoodie.deltastreamer.kafka.commit_on_errors=true" \
     --continuous
   
   # Step 2 - Generated data multiple times , Used data used in docker demo -
   #    cat docker/demo/data/batch_1.json | head -10 | kcat -b kafkabroker -t 
stock_ticks -P
   
   # Step 3 - Add a optional field in schema file and produced records without 
new col and monitor loads until compaction happens.
   
   # Step 4 - Produced some records with that newCOl and monitor loads. Runs 
fine after compaction also and data also looking correct.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] HUDI MOR table type compaction failed post adding new field in the schema [hudi]

Reply via email to