cmackenzie1 opened a new issue, #4190: URL: https://github.com/apache/arrow-rs/issues/4190
## Is your feature request related to a problem or challenge? Please describe what you are trying to do. Add a new `object_store` provider for [Cloudflare R2](https://developers.cloudflare.com/r2/). Currently, while the AWS S3 implementation can support S3-compatible object store - the support for them is not guaranteed due to provider specific "drifts" that can not always be captured in the AWS S3 provider. By adding a Cloudflare R2 specific provider, we can utilize much of the common functions (AWS SigV4 signing) that exists in the `object_store` crate, while also accounting for some of the differences in their API's. For example, Cloudflare R2 supports the following extensions on the S3 API: - [Create bucket on first upload](https://developers.cloudflare.com/r2/api/s3/extensions/#auto-creating-buckets-on-upload) - [Conditional operations in `PutObject`](https://developers.cloudflare.com/r2/api/s3/extensions/#conditional-operations-in-putobject) As well as some minor differences in how some operations are performed, such as: [`UploadPart` requires uniform sizes (except the last part)](https://developers.cloudflare.com/r2/api/workers/workers-api-reference/#r2multipartupload-definition) ## Describe the solution you'd like A Cloudflare R2 `object_store` provider that accounts for the drifts and extensions mentioned in the previous section. An `ObjectStore` trait implementation for R2 should have a scheme in the form of `https://<account_id>.r2.cloudflarestorage.com/<bucket>`. Virtual-style hosting is also supported `https://<bucket>.<account_id>.r2.cloudflarestorage.com/` ## Describe alternatives you've considered One alternative I've considered is increasing the scope / functionality of the existing AWS S3 provider. However, some of the changes are not compatible with AWS S3. One such example is the support for atomic copies. By adding this functionality using feature flags and conditional compilation, it would forbid the usage of multiple flavors of S3 providers if they do not all support the same feature set (i.e., using AWS S3 and Cloudflare R2 if there was some flag "aws_atomic_copy"). Another alternative is us maintaining our own `ObjectStore` trait implementation alongside our code - while this is possible we would like to utilize the existing AWS SigV4 and other non-public functionality in the `object_store` crate. Additionally, this allows user users of arrow-rs (and delta-rs) to use Cloudflare R2 without needing to also implement support for it themselves. ### Atomic `CopyObject` API The functionality of atomic copies will be enabled / used through the addition of the following headers on a `CopyObject` request: - `cf-copy-destination-if-none-match: *`. Setting this to `*` will error out if the destination file already exists at the time of the request. _This feature has not yet been released and is currently a work in progress from the Cloudflare R2 team_ This header behaves similarly to [`If-None-Match`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match). ## Additional context While this solves the use case for Cloudflare R2, this definitely makes me think whether another trait should be introduced (i.e., `S3Provider`) where all AWS SigV4 signing happens automatically for any provider (AWS S3, Cloudflare R2, Ceph, Minio, Wasabi, etc) and each method has a default implementation with possibility of overriding each request per provider. It is important to keep in mind the possibility of using two different providers simultaneously within a project (no `cfg` flags for switching logic at compile time) ## Related issues: - https://github.com/delta-io/delta-rs/issues/1104 - https://github.com/delta-io/delta-rs/issues/974 - https://github.com/apache/arrow/issues/34363 - https://community.cloudflare.com/t/cannot-upload-to-r2-from-pyarrow/426996/5 - The S3 `ObjectStore` [hard-coded hostnames](https://github.com/aplunk/arrow-rs/commit/cb747de6ad761afa60db326cc85876a3736b279a#diff-463799e28d6cd175bcf4dd2b296385646ebed96271da65b05a02ab133d31c82c) rendering it unusable for R2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
