xccui opened a new issue, #8516:
URL: https://github.com/apache/hudi/issues/8516
Use a Flink streaming job to write MoR tables. The compaction of a series of
table was blocked by the following exception. It seems that the parquet file
name in the compaction plan differs from the actual file name in terms of the
write token part.
The actual file is
`55078b57-488a-4be1-87ac-204548d3ec66_1-5-24_20230420023427524.parquet`.
```
2023-04-20 13:35:10 [pool-31-thread-1] ERROR
org.apache.hudi.sink.compact.CompactOperator [] - Executor
executes action [Execute compaction for instant 20230420041145422 from task 1]
error
org.apache.hudi.exception.HoodieIOException: Failed to read footer for
parquet
s3a://path-to-table/dt=2023-01-20/hr=19/55078b57-488a-4be1-87ac-204548d3ec66_1-5-23_20230420023427524.parquet
at
org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:95)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.common.util.ParquetUtils.readSchema(ParquetUtils.java:208)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:230)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.io.storage.HoodieAvroParquetReader.getSchema(HoodieAvroParquetReader.java:104)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:91)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:374)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:365)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:144)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.sink.compact.CompactOperator.doCompaction(CompactOperator.java:133)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:116)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: java.io.FileNotFoundException: No such file or directory:
s3a://path-to-table/dt=2023-01-20/hr=19/55078b57-488a-4be1-87ac-204548d3ec66_1-5-23_20230420023427524.parquet
at
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3866)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$24(S3AFileSystem.java:3556)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3554)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
promoted.ai.org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
promoted.ai.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:469)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
promoted.ai.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:454)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
at
org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:93)
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
... 15 more
```
**Environment Description**
* Hudi version : bdb50ddccc9631317dfb06a06abc38cbd3714ce8
* Flink version : 1.16.1
* Storage (HDFS/S3/GCS..) : S3
* Running on Docker? (yes/no) :
**Additional context**
The job had metadata enabled first. I disabled the metadata table when
restarting the job from a checkpoint.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]