sivabalan narayanan created HUDI-2772:
-----------------------------------------

             Summary: Deltastreamer fails to read checkpoint from previous 
commit metadata by spark writer
                 Key: HUDI-2772
                 URL: https://issues.apache.org/jira/browse/HUDI-2772
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: sivabalan narayanan


Even after setting the right config to copy over deltastreamer checkpoint, 
deltastreamer fails to read the checkpoint from previous commit metadata. 

I inspected the commit.completed instant and it looks good. 

 
{code:java}
grep "checkpoint" 
/tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074129737.deltacommit
    "deltastreamer.checkpoint.key" : "1637066483000" {code}
But after the below exception, if I restart deltastreamer, it just runs fine. 
Are there chances of reading partially written files so that checkpoint is not 
available but the previous commit.completed file is present and so 
deltastreamer reads and could not find the checkpoint entry ? 

 

here is the checkpoint from last delta commit by deltastreamer (which matches 
the entry found by delta commit by spark writer above)
{code:java}
grep "checkpoint" 
/tmp/hudi-deltastreamer-gh-mw/.hoodie/20211116074123384.deltacommit
    "deltastreamer.checkpoint.key" : "1637066483000" {code}
stacktrace: 
{code:java}
21/11/16 07:41:31 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to 
exception
org.apache.hudi.utilities.exception.HoodieDeltaStreamerException: Unable to 
find previous checkpoint. Please double check if this table was indeed built 
via delta streamer. Last Commit 
:Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants 
:[[20211116072710523__deltacommit__COMPLETED], 
[20211116072748906__deltacommit__COMPLETED], 
[20211116072910768__deltacommit__COMPLETED], 
[20211116074114874__deltacommit__COMPLETED], 
[20211116074123384__deltacommit__COMPLETED], 
[20211116074129737__deltacommit__COMPLETED]], CommitMetadata={
  "partitionToWriteStats" : { },
  "compacted" : false,
  "extraMetadata" : { },
  "operationType" : "UNKNOWN",
  "fileIdAndRelativePaths" : { },
  "totalRecordsDeleted" : 0,
  "totalLogRecordsCompacted" : 0,
  "totalLogFilesCompacted" : 0,
  "totalCompactedRecordsUpdated" : 0,
  "totalLogFilesSize" : 0,
  "totalScanTime" : 0,
  "totalCreateTime" : 0,
  "totalUpsertTime" : 0,
  "minAndMaxEventTime" : {
    "Optional.empty" : {
      "val" : null,
      "present" : false
    }
  },
  "writePartitionPaths" : [ ]
}
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:346)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:281)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:634)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
21/11/16 07:41:31 WARN HoodieDeltaStreamer: Gracefully shutting down compactor
21/11/16 07:41:39 WARN SparkRDDWriteClient: Slept for 20 secs, proceeding 
21/11/16 07:41:40 ERROR HoodieAsyncService: Monitor noticed one or more threads 
failed. Requesting graceful shutdown of other threads
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: Unable to find previous checkpoint. 
Please double check if this table was indeed built via delta streamer. Last 
Commit :Option{val=[20211116074129737__deltacommit__COMPLETED]}, Instants 
:[[20211116072710523__deltacommit__COMPLETED], 
[20211116072748906__deltacommit__COMPLETED], 
[20211116072910768__deltacommit__COMPLETED], 
[20211116074114874__deltacommit__COMPLETED], 
[20211116074123384__deltacommit__COMPLETED], 
[20211116074129737__deltacommit__COMPLETED]], CommitMetadata={{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to