tzembo opened a new issue, #6914:
URL: https://github.com/apache/arrow-rs/issues/6914
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
I'd like verify the integrity of uploaded objects (using some kind of
checksum) across all three cloud providers.
Currently, the S3 implementation allows setting an AmazonS3 configuration
value that attaches a `x-amz-checksum-sha256` header to PUT requests against
the store. However:
- This is only available for AWS (and SHA256).
- AWS supports other checksum algorithms (MD5, SHA1, CRC32, CRC32C)
- Azure supports MD5 and CRC64
- GCP supports MD5 and CRC32C
- This requires another pass to calculate the checksum value, which the user
of this library may already have computed in another context.
**Describe the solution you'd like**
I'm proposing that we add a `Checksum` attribute which specifies a
`ChecksumAlgorithm` enum. The value for this attribute would be the
base64-encoded checksum value. For now, MD5 can be the only supported checksum
algorithm (which all three cloud providers support via the `Content-MD5`
header). The value for this algorithm is a base64-encoded 128-bit digest.
```
pub enum Attribute {
...
/// Provides a checksum used to verify object data integrity
Checksum(ChecksumAlgorithm),
}
pub enum ChecksumAlgorithm {
MD5,
}
```
**Describe alternatives you've considered**
I considered implementing this for more checksum algorithms, but I'm
starting with MD5 because it's the only one supported by all three cloud
providers. In the future, we could extend this to support additional checksum
algorithms (e.g. CRC32C). However:
- stores that do not support a particular error would need to return an error
- stores would ideally be able to "report" their supported checksum
algorithms (for usability reasons)
- (AWS) we'd need to figure out how a SHA-256 checksum provided via
attributes interacts with the one provided via config
I considered calling the attribute `ContentMD5` but decided to make it a bit
more generic to support additional checksums in the future.
**Additional context**
I can put up a PR for this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]