eisig opened a new issue #789: HoodieMergeOnReadTable rollback hangs
URL: https://github.com/apache/incubator-hudi/issues/789
 
 
   There seems to be two bugs  with the master branch(commit: ae3c02fb3)
   my steps:
   1. use HDFSParquetImporter to import from hive to hudi
   2. use  HoodieDeltaStreamer to import new data from kafka.(I add a option to 
allow missing checkpointStr)
       the config is same as #779, with --disable-compaction.
    And then 
      ` select distinct _hoodie_commit_time from rt_table/ro_table`  only 
return the first the commit time (use max() to ensure no newer commits return); 
 But there are newer .deltacommit file in the .hoodie folder.
   
   3. restart the spark job. open the spark UI, will find that   the job hangs 
at `collect at HoodieMergeOnReadTable.java:318` (It hangs every time)
   
   ```
   org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
   
com.uber.hoodie.table.HoodieMergeOnReadTable.rollback(HoodieMergeOnReadTable.java:318)
   
com.uber.hoodie.HoodieWriteClient.doRollbackAndGetStats(HoodieWriteClient.java:884)
   
com.uber.hoodie.HoodieWriteClient.rollbackInternal(HoodieWriteClient.java:962)
   com.uber.hoodie.HoodieWriteClient.rollback(HoodieWriteClient.java:773)
   
com.uber.hoodie.HoodieWriteClient.rollbackInflightCommits(HoodieWriteClient.java:1182)
   
com.uber.hoodie.HoodieWriteClient.startCommitWithTime(HoodieWriteClient.java:1050)
   com.uber.hoodie.HoodieWriteClient.startCommit(HoodieWriteClient.java:1043)
   
com.uber.hoodie.utilities.deltastreamer.DeltaSync.startCommit(DeltaSync.java:406)
   
com.uber.hoodie.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:332)
   
com.uber.hoodie.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:227)
   
com.uber.hoodie.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:382)
   
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
   
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   java.lang.Thread.run(Thread.java:748)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to