clbarnes commented on code in PR #5281:
URL: https://github.com/apache/arrow-rs/pull/5281#discussion_r1442731956


##########
object_store/src/local.rs:
##########
@@ -1082,6 +1097,58 @@ fn convert_walkdir_result(
     }
 }
 
+
+/// Download a remote object to a local [`File`]

Review Comment:
   I think these docstrings are the wrong way round.



##########
object_store/src/local.rs:
##########
@@ -1082,6 +1097,58 @@ fn convert_walkdir_result(
     }
 }
 
+
+/// Download a remote object to a local [`File`]
+pub async fn upload(store: &dyn ObjectStore, location: &Path, opts: 
PutOptions, file: &mut std::fs::File) -> Result<()> {

Review Comment:
   Could the bounds on `file` be any looser? It might be nice to have any `impl 
Read`, although you'd then need to supply the length or additionally use `+ 
Seek` (which would be less efficient for finding length than a `File` would 
be). You could make a container struct which just contains a reader and its 
total length, then give easy ways of constructing that from a known length, or 
`impl Read + Seek`, or a `File` (although I think you couldn't use `TryFrom` 
without specialization as a `File` is also `Read + Seek`).
   
   Could also return the number of bytes written rather than an empty tuple, 
just in case it's useful to anyone (more useful if the source is an arbitrary 
readable).



##########
object_store/src/local.rs:
##########
@@ -1082,6 +1097,58 @@ fn convert_walkdir_result(
     }
 }
 
+
+/// Download a remote object to a local [`File`]
+pub async fn upload(store: &dyn ObjectStore, location: &Path, opts: 
PutOptions, file: &mut std::fs::File) -> Result<()> {
+    // Determine file size
+    let metadata = file.metadata().map_err(|e| Error::FileMetadata {
+        source: e.into(),
+    })?;
+    let file_size = metadata.len();
+
+    // Set a threshold for when to switch to multipart_put
+    let multipart_threshold: u64 = 50 * 1024 * 1024;
+
+    if file_size <= multipart_threshold {
+        let mut buffer = Vec::with_capacity(file_size as usize);
+        file.read_to_end(&mut buffer).map_err(|e| 
Error::UnableToReadBytesFromFile{
+            source: e
+        })?;
+        let bytes = Bytes::from(buffer);
+        store.put_opts(&location, bytes, opts).await?;
+        Ok(())
+    } else {
+        let (_id, mut writer) =  store.put_multipart(&location).await?;
+        let mut buffer = vec![0u8; 5 * 1024 * 1024];
+        while let Ok(size) = file.read(&mut buffer) {
+            if size == 0 {
+                break;
+            }
+            writer.write_all(&buffer[..size]).await.unwrap();
+        }
+
+        writer.flush().await.unwrap();
+        writer.shutdown().await.unwrap();
+        Ok(())
+    }
+}
+
+
+/// Upload a local [`File`] to a remote object store
+pub async fn download(store: &dyn ObjectStore, location: &Path, opts: 
GetOptions, file: &mut File) -> Result<()> {

Review Comment:
   Same as above, could `File` be `impl Write`? And it could return the number 
of bytes written.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to