zuyanton opened a new issue #1780:
URL: https://github.com/apache/hudi/issues/1780
We are having an issue when running simple count query on our hudi table via
hive. the error is Hudi File Id has more then one pending compactions. The
table is MoR , compaction gets executed in line, table persisted to S3 ,
consistency check is turned on.
Error does not make sense to me as it suggest that there are two pending
compactions - 20200701015658 and 20200630235744 (see stack trace bellow).
However all compaction are running in line and hence there should be no case of
two compactions being pending as well as logs note that both compactions
finished successfully
```20/07/01 00:13:14 INFO HoodieWriteClient: Compacted successfully on
commit 20200630235744```
and ```hudi-cli compactions show all``` suggests the same :
```
╔═════════════════════════╤═══════════╤═══════════════════════════════╗
║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║
╠═════════════════════════╪═══════════╪═══════════════════════════════╣
║ 20200701015658 │ COMPLETED │ 38 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20200701015658 │ COMPLETED │ 38 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20200701015658 │ COMPLETED │ 38 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20200630235744 │ COMPLETED │ 30 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20200630235744 │ COMPLETED │ 30 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20200630235744 │ COMPLETED │ 30 ║
╚═════════════════════════╧═══════════╧═══════════════════════════════╝
```
when checking content of .hoodie folder, I can see that all three files for
each compaction (*.compaction.requested ,*.compaction.infligh, *.commit) are
present. It's seems like CompactionUtils.getAllPendingCompactionOperations
possibly wrongly identifies "pending" compactions.
**Environment Description**
* Hudi version : 0.5.3
* Spark version : 2.4.4
* Hive version : 2.3.6
* Hadoop version : 2.8.5
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) : no
**Stacktrace**
```Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1592430479775_0691_2_00,
diagnostics=[Vertex vertex_1592430479775_0691_2_00 [Map 1] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ofa_gl_je_lines_100_ro initializer
failed, vertex=vertex_1592430479775_0691_2_00 [Map 1],
java.lang.IllegalStateException: Hudi File Id
(HoodieFileGroupId{partitionPath='61',
fileId='f071cf58-8601-4ecd-b2da-80e5b0a92d47-3'}) has more than 1 pending
compactions. Instants: (20200701015658,{"baseInstantTime": "20200630235744",
"deltaFilePaths":
[".f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_20200630235744.log.1_1-22-9631"],
"dataFilePath":
"f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_14-30-9419_20200630235744.parquet",
"fileId": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3", "partitionPath": "61",
"metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 111.0,
"TOTAL_LOG_FILES_SIZE": 7.3045072E7, "TOTAL_IO_WRITE_MB": 42.0, "TOTAL_IO_MB":
153.0, "TOTAL_LOG_FILE_SIZE": 7.3045072E7}}),
(20200630235744,{"baseInstantTime": "20200630115655", "deltaFilePaths":
[".f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_20200630115655.log.1_1-22-9569"],
"dataFilePath":
"f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_26-30-9435_20200630115655.parquet",
"fileId": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3", "partitionPath": "61",
"metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 44.0,
"TOTAL_LOG_FILES_SIZE": 2116823.0, "TOTAL_IO_WRITE_MB": 42.0, "TOTAL_IO_MB":
86.0, "TOTAL_LOG_FILE_SIZE": 2116823.0}})
at
org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at
org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.java:149)
at
org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:95)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:87)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:81)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:72)
at
org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:110)
at
org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:89)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]