cmackenzie1 opened a new issue, #4190:
URL: https://github.com/apache/arrow-rs/issues/4190

   ## Is your feature request related to a problem or challenge? Please 
describe what you are trying to do.
   
   Add a new `object_store` provider for [Cloudflare 
R2](https://developers.cloudflare.com/r2/). Currently, while the AWS S3 
implementation can support S3-compatible object store - the support for them is 
not guaranteed due to provider specific "drifts" that can not always be 
captured in the AWS S3 provider. By adding a Cloudflare R2 specific provider, 
we can utilize much of the common functions (AWS SigV4 signing) that exists in 
the `object_store` crate, while also accounting for some of the differences in 
their API's.
   
   For example, Cloudflare R2 supports the following extensions on the S3 API:
   
   - [Create bucket on first 
upload](https://developers.cloudflare.com/r2/api/s3/extensions/#auto-creating-buckets-on-upload)
   - [Conditional operations in 
`PutObject`](https://developers.cloudflare.com/r2/api/s3/extensions/#conditional-operations-in-putobject)
   
   As well as some minor differences in how some operations are performed, such 
as:
   [`UploadPart` requires uniform sizes (except the last 
part)](https://developers.cloudflare.com/r2/api/workers/workers-api-reference/#r2multipartupload-definition)
   
   ## Describe the solution you'd like
   
   A Cloudflare R2 `object_store` provider that accounts for the drifts and 
extensions mentioned in the previous section.
   
   An `ObjectStore` trait implementation for R2 should have a scheme in the 
form of `https://<account_id>.r2.cloudflarestorage.com/<bucket>`. Virtual-style 
hosting is also supported 
`https://<bucket>.<account_id>.r2.cloudflarestorage.com/`
   
   ## Describe alternatives you've considered
   
   One alternative I've considered is increasing the scope / functionality of 
the existing AWS S3 provider. However, some of the changes are not compatible 
with AWS S3. One such example is the support for atomic copies. By adding this 
functionality using feature flags and conditional compilation, it would forbid 
the usage of multiple flavors of S3 providers if they do not all support the 
same feature set (i.e., using AWS S3 and Cloudflare R2 if there was some flag 
"aws_atomic_copy").
   
   Another alternative is us maintaining our own `ObjectStore` trait 
implementation alongside our code - while this is possible we would like to 
utilize the existing AWS SigV4 and other non-public functionality in the 
`object_store` crate. Additionally, this allows user users of arrow-rs (and 
delta-rs) to use Cloudflare R2 without needing to also implement support for it 
themselves.
   
   ### Atomic `CopyObject` API
   
   The functionality of atomic copies will be enabled / used through the 
addition of the following headers on a `CopyObject` request:
   
   - `cf-copy-destination-if-none-match: *`. Setting this to `*` will error out 
if the destination file already exists at the time of the request.  
   _This feature has not yet been released and is currently a work in progress 
from the Cloudflare R2 team_
   
   This header behaves similarly to 
[`If-None-Match`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match).
   
   ## Additional context
   
   While this solves the use case for Cloudflare R2, this definitely makes me 
think whether another trait should be introduced (i.e., `S3Provider`) where all 
AWS SigV4 signing happens automatically for any provider (AWS S3, Cloudflare 
R2, Ceph, Minio, Wasabi, etc) and each method has a default implementation with 
possibility of overriding each request per provider. It is important to keep in 
mind the possibility of using two different providers simultaneously within a 
project (no `cfg` flags for switching logic at compile time) 
   
   ## Related issues:
   
   - https://github.com/delta-io/delta-rs/issues/1104
   - https://github.com/delta-io/delta-rs/issues/974
   - https://github.com/apache/arrow/issues/34363
   - 
https://community.cloudflare.com/t/cannot-upload-to-r2-from-pyarrow/426996/5
   - The S3 `ObjectStore` [hard-coded 
hostnames](https://github.com/aplunk/arrow-rs/commit/cb747de6ad761afa60db326cc85876a3736b279a#diff-463799e28d6cd175bcf4dd2b296385646ebed96271da65b05a02ab133d31c82c)
 rendering it unusable for R2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to