rohit-m-99 opened a new issue #4796: URL: https://github.com/apache/hudi/issues/4796
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** When running the following script I am able to ingest sims and cluster. However when I kill the spark job and rerun I am met with a NPE. `#!/bin/bash spark-submit \ --jars /opt/spark/jars/hadoop-aws.jar,/opt/spark/jars/aws-java-sdk.jar,/opt/spark/jars/spark-avro.jar \ --master spark://spark-master:7077 \ --total-executor-cores 10 \ --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer opt/spark/jars/hudi-utilities-bundle.jar \ --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ --target-table per_tick_stats \ --table-type COPY_ON_WRITE \ --continuous \ --source-ordering-field $3 \ --target-base-path $2 \ --hoodie-conf hoodie.deltastreamer.source.dfs.root=$1 \ --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \ --hoodie-conf hoodie.datasource.write.recordkey.field=$4 \ --hoodie-conf hoodie.datasource.write.precombine.field=$3 \ --hoodie-conf hoodie.clustering.async.enabled=true \ --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=$5 \ --hoodie-conf hoodie.datasource.write.partitionpath.field=''` **To Reproduce** Steps to reproduce the behavior: 1. Run HoodieDeltaStreamer with async clustering and no partition path 2. Kill the job 3. Rerun the job **Expected behavior** The rerun of the job should continue w/o NPE. **Environment Description** * Hudi version : 0.10.1 (also saw this on 0.9.0) * Spark version : 3.0.3 * Hadoop version : 3.2.0 * Storage (HDFS/S3/GCS..) : AWS S3 * Running on Docker? (yes/no) : No **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` `22/02/11 23:26:04 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception java.lang.NullPointerException at org.apache.hudi.utilities.deltastreamer.DeltaSync.getClusteringInstantOpt(DeltaSync.java:864) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:648) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 22/02/11 23:26:04 INFO HoodieDeltaStreamer: Delta Sync shutdown. Error ?true 22/02/11 23:26:04 INFO HoodieDeltaStreamer: DeltaSync shutdown. Closing write client. Error?true 22/02/11 23:26:04 ERROR HoodieAsyncService: Service shutdown with error java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:89) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:182) at org.apache.hudi.common.util.Option.ifPresent(Option.java:96) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:179) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:514) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hudi.exception.HoodieException at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:666) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.hudi.utilities.deltastreamer.DeltaSync.getClusteringInstantOpt(DeltaSync.java:864) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:648) ... 4 more` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
