tustvold commented on issue #5458:
URL: https://github.com/apache/arrow-rs/issues/5458#issuecomment-1977914822
So the major challenge with providing a multipart API for LocalFilesystem is
that there is no obvious way to transport the part size in use. This presents a
problem for determining the offset of the final part, which will likely be
smaller than the file's chunk size:
There are a few options here, but none of them particularly great:
* Encode the part size in a separate file, adds a file read/write to every
part write
* Use a mechanism like xattr, but this would limit platform support
* Encode the part size in the MultipartId, but this would require specifying
the part size when creating the upload
* Encode the part size in the file, but this would be fragile and hard to
coordinate with parallel uploads
* Keep multipart uploads as separate files, this would complicate listing
and retrieval logic and break compatibility with non-ObjectStore based systems
* Concatenate the the parts once upload finished, this would be simple but
slow
Taking a step back, I think there are two users of multipart uploads:
1. Users who just want to stream data to durable storage
2. Users doing a chunked upload of an existing data set
Users in the second category are extremely unlikely to care about
LocalFilesystem support, as they could just use the filesystem directly. As
such I suspect they are adequately served by MultipartStore. I therefore think
we can just focus our efforts on the first category of user, providing an
efficient way to stream data, in-order to durable storage.
I'm therefore leaning towards replacing `put_multipart` with
```
trait ObjectStore {
fn upload(&self, path: &Path) -> Result<Box<dyn Upload>>;
...
}
pub struct UploadOptions {
/// A hint as to the size of the object to be uploaded, implementations
may use this to select an appropriate IO size
pub size_hint: Option<usize>,
/// Implementations may perform chunked uploads in parallel, use this to
restrict the concurrency
pub max_concurrency: Option<usize>,
}
#[async_trait]
pub trait Upload {
/// Enqueue data to be uploaded
pub fn write(&mut self, data: &[u8]) -> Result<()> { ... }
/// Enqueue `data` to be uploaded
pub fn put(&mut self, data: Bytes) -> Result<()> {
self.write(&data)
}
/// Flush as much data as possible to durable storage
///
/// Returns the offset up to which data has been made durable
///
/// Some implementations may have IO size restrictions making this best
effort
pub async fn flush(&mut self) -> Result<usize> { ... }
/// Flush all written data, complete this upload, and return the
[`PutResult`]
pub async fn shutdown(&mut self) -> Result<PutResult> { ... }
/// Abort this upload
pub async fn abort(&mut self) -> Result<()> { ... }
}
```
There are a few things worth highlighting here:
* The synchronous `write` will integrate well with synchronous writers, e.g.
#5471
* Implementations can choose to use the cheaper `Put` instead of
`PutMultipart` if data sizes are small
* Part sizing is abstracted away from the user
* Implementations are not constrained on IO granularity, e.g.
`LocalFilesystem` can stream writes directly
* The `Upload` interface should be significantly easier to implement than
`AsyncWrite`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]