nsivabalan commented on issue #4796: URL: https://github.com/apache/hudi/issues/4796#issuecomment-1037426901
I could not reproduce. I also tried w/ ComplexKeyGen and empty partition path and no schema provider configs. yet could not reproduce. sorry. we might need reproducible steps w/ some dataset if feasible. ``` /bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4,org --driver-memory 8g --executor-memory 8g --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer path_to_/hudi-utilities-bundle_2.11-0.10.1.jar --props /tmp/parquet-dfs-cluster.props --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --source-ordering-field created_at --table-type COPY_ON_WRITE --target-base-path file:\/\/\/tmp/hudi-deltastreamer-gh1/ --target-table gh_hudi_tbl31 --op UPSERT --hoodie-conf hoodie.clustering.async.enabled=true --continuous --source-limit 4000000 --min-sync-interval-seconds 30 ``` properties file contents ``` hoodie.datasource.write.recordkey.field=other,org.id hoodie.datasource.write.partitionpath.field= hoodie.datasource.write.precombine.field=created_at hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator hoodie.metadata.enable=false hoodie.upsert.shuffle.parallelism=8 hoodie.insert.shuffle.parallelism=8 hoodie.delete.shuffle.parallelism=8 hoodie.bulkinsert.shuffle.parallelism=8 hoodie.deltastreamer.source.dfs.root=/dataset_path/ hoodie.clustering.plan.strategy.sort.columns=created_at hoodie.clustering.plan.strategy.daybased.lookback.partitions=0 hoodie.clustering.async.max.commits=2 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
