maheshguptags commented on issue #10609: URL: https://github.com/apache/hudi/issues/10609#issuecomment-2167275346
Hi, @michael1991 thank you for solving this, I can run the deltastream with RLI. Out of curiosity, how did you figure out we need to pass the jar in extraPath? ```spark/bin/spark-submit \ --name customer-event-hudideltaStream \ --num-executors 10 \ --executor-memory 2g \ --driver-memory 3g \ --packages org.apache.hadoop:hadoop-aws:3.3.4 \ --jars /home/mahesh.gupta/aws-msk-iam-auth-1.1.9-all.jar \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /home/mahesh.gupta/hudi-utilities-bundle_2.12-0.14.1.jar \ --checkpoint s3a://cdp-offline-store-perf2/checkpointing/eks/sparkhudipoc/hudistream_rli_4 \ --target-base-path s3a://cdp-offline-store-perf2/customer_event_temp_hudi_delta/ \ --target-table customer_event_temp \ --table-type COPY_ON_WRITE \ --base-file-format PARQUET \ --props /home/mahesh.gupta/deltaHoodie.properties \ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ --source-ordering-field updated_date \ --payload-class org.apache.hudi.common.model.DefaultHoodieRecordPayload \ --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \ --hoodie-conf hoodie.streamer.schemaprovider.source.schema.file=/home/mahesh.gupta/source.avsc \ --hoodie-conf hoodie.streamer.schemaprovider.target.schema.file=/home/mahesh.gupta/source.avsc \ --op UPSERT \ --hoodie-conf hoodie.streamer.source.kafka.topic=cdp_track_temp_perf \ --hoodie-conf hoodie.datasource.write.partitionpath.field=client_id \ --continuous ``` @ad1happy2go will need some help in memory tuning for delta stream. please let me know if there is any doc fo it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
