[GitHub] [hudi] xccui opened a new issue, #8516: [SUPPORT] Mismatched write token for parquet files

via GitHub Thu, 20 Apr 2023 06:57:28 -0700


xccui opened a new issue, #8516:
URL: https://github.com/apache/hudi/issues/8516


   Use a Flink streaming job to write MoR tables. The compaction of a series of 
table was blocked by the following exception. It seems that the parquet file 
name in the compaction plan differs from the actual file name in terms of the 
write token part.
   
   The actual file is 
`55078b57-488a-4be1-87ac-204548d3ec66_1-5-24_20230420023427524.parquet`.
   ```
   2023-04-20 13:35:10 [pool-31-thread-1] ERROR 
org.apache.hudi.sink.compact.CompactOperator                 [] - Executor 
executes action [Execute compaction for instant 20230420041145422 from task 1] 
error
   org.apache.hudi.exception.HoodieIOException: Failed to read footer for 
parquet 
s3a://path-to-table/dt=2023-01-20/hr=19/55078b57-488a-4be1-87ac-204548d3ec66_1-5-23_20230420023427524.parquet
        at 
org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:95) 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.common.util.ParquetUtils.readSchema(ParquetUtils.java:208) 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:230) 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.io.storage.HoodieAvroParquetReader.getSchema(HoodieAvroParquetReader.java:104)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:91)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:374)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:365)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:144)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.sink.compact.CompactOperator.doCompaction(CompactOperator.java:133)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:116)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: java.io.FileNotFoundException: No such file or directory: 
s3a://path-to-table/dt=2023-01-20/hr=19/55078b57-488a-4be1-87ac-204548d3ec66_1-5-23_20230420023427524.parquet
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3866) 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$24(S3AFileSystem.java:3556)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3554) 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
promoted.ai.org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
promoted.ai.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:469)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
promoted.ai.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:454)
 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        at 
org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:93) 
~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
        ... 15 more
   ```
   
   **Environment Description**
   
   * Hudi version : bdb50ddccc9631317dfb06a06abc38cbd3714ce8
   
   * Flink version : 1.16.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   The job had metadata enabled first. I disabled the metadata table when 
restarting the job from a checkpoint.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xccui opened a new issue, #8516: [SUPPORT] Mismatched write token for parquet files

Reply via email to