zuyanton opened a new issue #1780:
URL: https://github.com/apache/hudi/issues/1780


   
   We are having an issue when running simple count query on our hudi table via 
hive. the error is Hudi File Id has more then one pending compactions. The 
table is MoR , compaction gets executed in line, table persisted to S3 , 
consistency check is turned on.   
   
   Error  does not make sense to me as it suggest that there are two pending 
compactions - 20200701015658 and 20200630235744 (see stack trace bellow). 
However all compaction are running in line and hence there should be no case of 
two compactions being pending as well as logs note that both compactions 
finished successfully   
   ```20/07/01 00:13:14 INFO HoodieWriteClient: Compacted successfully on 
commit 20200630235744```    
   and  ```hudi-cli compactions show all``` suggests the same :  
   ```
   ╔═════════════════════════╤═══════════╤═══════════════════════════════╗
   ║ Compaction Instant Time │ State     │ Total FileIds to be Compacted ║
   ╠═════════════════════════╪═══════════╪═══════════════════════════════╣
   ║ 20200701015658          │ COMPLETED │ 38                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200701015658          │ COMPLETED │ 38                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200701015658          │ COMPLETED │ 38                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200630235744          │ COMPLETED │ 30                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200630235744          │ COMPLETED │ 30                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200630235744          │ COMPLETED │ 30                            ║
   ╚═════════════════════════╧═══════════╧═══════════════════════════════╝
   
   ```
   when checking content of .hoodie folder, I can see that all three files for 
each compaction (*.compaction.requested ,*.compaction.infligh, *.commit) are 
present. It's seems like CompactionUtils.getAllPendingCompactionOperations 
possibly wrongly identifies "pending" compactions.
   **Environment Description**
   
   * Hudi version : 0.5.3 
   
   * Spark version : 2.4.4 
   
   * Hive version : 2.3.6
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no 
   
   
   **Stacktrace**
   
   ```Status: Failed
   Vertex failed, vertexName=Map 1, vertexId=vertex_1592430479775_0691_2_00, 
diagnostics=[Vertex vertex_1592430479775_0691_2_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ofa_gl_je_lines_100_ro initializer 
failed, vertex=vertex_1592430479775_0691_2_00 [Map 1], 
java.lang.IllegalStateException: Hudi File Id 
(HoodieFileGroupId{partitionPath='61', 
fileId='f071cf58-8601-4ecd-b2da-80e5b0a92d47-3'}) has more than 1 pending 
compactions. Instants: (20200701015658,{"baseInstantTime": "20200630235744", 
"deltaFilePaths": 
[".f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_20200630235744.log.1_1-22-9631"], 
"dataFilePath": 
"f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_14-30-9419_20200630235744.parquet", 
"fileId": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3", "partitionPath": "61", 
"metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 111.0, 
"TOTAL_LOG_FILES_SIZE": 7.3045072E7, "TOTAL_IO_WRITE_MB": 42.0, "TOTAL_IO_MB": 
153.0, "TOTAL_LOG_FILE_SIZE": 7.3045072E7}}), 
(20200630235744,{"baseInstantTime": "20200630115655", "deltaFilePaths": 
[".f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_20200630115655.log.1_1-22-9569"], 
"dataFilePath": 
"f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_26-30-9435_20200630115655.parquet", 
"fileId": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3", "partitionPath": "61", 
"metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 44.0, 
"TOTAL_LOG_FILES_SIZE": 2116823.0, "TOTAL_IO_WRITE_MB": 42.0, "TOTAL_IO_MB": 
86.0, "TOTAL_LOG_FILE_SIZE": 2116823.0}})
        at 
org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
        at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
        at 
org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.java:149)
        at 
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:95)
        at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:87)
        at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:81)
        at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:72)
        at 
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:110)
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:89)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to