[
https://issues.apache.org/jira/browse/HUDI-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17571747#comment-17571747
]
Bowen Zhu commented on HUDI-2118:
---------------------------------
there are multiple ways to upload an object in GCS:
[https://cloud.google.com/storage/docs/uploads-downloads]
The resumable upload, multi-part upload, parallel upload and streaming upload
are not strict atomic upload. The object won't show up in normal bucket
listing, but partial upload is possible and can be queried and listed.
And the streaming upload could allow corrupted file to be accessible after
transfer completed. And the file would remain accessible until deleted later by
validating the checksum after transfer completed.
We would need to make sure hudi would never use those non-atomic upload methods
for GCS, or we need to mark the GCS upload type as non-atomic.
> Avoid checking corrupt log blocks for cloud storage
> ---------------------------------------------------
>
> Key: HUDI-2118
> URL: https://issues.apache.org/jira/browse/HUDI-2118
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Rajesh Mahindra
> Assignee: Bowen Zhu
> Priority: Minor
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)