rohit-m-99 opened a new issue #4796:
URL: https://github.com/apache/hudi/issues/4796


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   When running the following script I am able to ingest sims and cluster. 
However when I kill the spark job and rerun I am met with a NPE.
   
   `#!/bin/bash
   spark-submit \
   --jars 
/opt/spark/jars/hadoop-aws.jar,/opt/spark/jars/aws-java-sdk.jar,/opt/spark/jars/spark-avro.jar
 \
   --master spark://spark-master:7077 \
   --total-executor-cores 10 \
   --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
opt/spark/jars/hudi-utilities-bundle.jar \
   --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
   --target-table per_tick_stats \
   --table-type COPY_ON_WRITE \
   --continuous \
   --source-ordering-field $3 \
   --target-base-path $2 \
   --hoodie-conf hoodie.deltastreamer.source.dfs.root=$1 \
   --hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
 \
   --hoodie-conf hoodie.datasource.write.recordkey.field=$4 \
   --hoodie-conf hoodie.datasource.write.precombine.field=$3 \
   --hoodie-conf hoodie.clustering.async.enabled=true \
   --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=$5 \
   --hoodie-conf hoodie.datasource.write.partitionpath.field=''`
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Run HoodieDeltaStreamer with async clustering and no partition path
   2. Kill the job
   3. Rerun the job
   
   **Expected behavior**
   
   The rerun of the job should continue w/o NPE.
   
   **Environment Description**
   
   * Hudi version : 0.10.1 (also saw this on 0.9.0)
   
   * Spark version : 3.0.3
   
   * Hadoop version : 3.2.0
   
   * Storage (HDFS/S3/GCS..) : AWS S3
   
   * Running on Docker? (yes/no) : No
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   `22/02/11 23:26:04 ERROR HoodieDeltaStreamer: Shutting down delta-sync due 
to exception
   java.lang.NullPointerException
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.getClusteringInstantOpt(DeltaSync.java:864)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:648)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   22/02/11 23:26:04 INFO HoodieDeltaStreamer: Delta Sync shutdown. Error ?true
   22/02/11 23:26:04 INFO HoodieDeltaStreamer: DeltaSync shutdown. Closing 
write client. Error?true
   22/02/11 23:26:04 ERROR HoodieAsyncService: Service shutdown with error
   java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException
        at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:89)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:182)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:179)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:514)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.hudi.exception.HoodieException
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:666)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NullPointerException
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.getClusteringInstantOpt(DeltaSync.java:864)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:648)
        ... 4 more`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to