[GitHub] [arrow-rs] crepererum commented on a diff in pull request #2563: Improve coalesce_ranges (#2561) (#2562)

GitBox Thu, 25 Aug 2022 01:32:05 -0700


crepererum commented on code in PR #2563:
URL: https://github.com/apache/arrow-rs/pull/2563#discussion_r954662682



##########
object_store/src/util.rs:
##########
@@ -180,12 +209,47 @@ mod tests {
         assert_eq!(fetches, vec![0..1, 56..75]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);
 
         let fetches = do_fetch(vec![0..1, 6..7, 8..9, 10..14, 9..10], 4).await;
         assert_eq!(fetches, vec![0..1, 6..14]);
+
+        let mut rand = thread_rng();
+        for _ in 0..100 {
+            let object_len = rand.gen_range(10..250);
+            let range_count = rand.gen_range(0..30);
+            let ranges: Vec<_> = (0..range_count)
+                .map(|_| {
+                    let start = rand.gen_range(0..object_len);
+                    let end = rand.gen_range(start..object_len);
+                    start..end
+                })
+                .collect();
+
+            let coalesce = rand.gen_range(1..5);

Review Comment:
   In case you decide against proptest, this should printout the state here 
(`ranges` and `coalesce`). Otherwise it will be impossible to debug failures 
later.



##########
object_store/src/util.rs:
##########
@@ -171,12 +200,47 @@ mod tests {
         assert_eq!(fetches, vec![0..1, 56..75]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);
 
         let fetches = do_fetch(vec![0..1, 6..7, 8..9, 10..14, 9..10], 4).await;
         assert_eq!(fetches, vec![0..1, 6..14]);
+
+        let mut rand = thread_rng();

Review Comment:
   maybe move the fuzz stuff into its own test. I would prefer using a proper 
fuzz test framework (like proptest) that handles the error reporting and 
minimization as well, but I leave the tradeoff "new large dependency VS less 
features" to you.



##########
object_store/src/util.rs:
##########
@@ -180,12 +209,47 @@ mod tests {
         assert_eq!(fetches, vec![0..1, 56..75]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);

Review Comment:
   So was the issue that it wasn't sorting the ranges before merging them?



##########
object_store/src/util.rs:
##########
@@ -180,12 +209,47 @@ mod tests {
         assert_eq!(fetches, vec![0..1, 56..75]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);
 
         let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
-        assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+        assert_eq!(fetches, vec![0..9]);
 
         let fetches = do_fetch(vec![0..1, 6..7, 8..9, 10..14, 9..10], 4).await;
         assert_eq!(fetches, vec![0..1, 6..14]);
+
+        let mut rand = thread_rng();
+        for _ in 0..100 {
+            let object_len = rand.gen_range(10..250);

Review Comment:
   Haven't tried the test myself, but I'm wondering how likely it is that this 
basically ends up as "one range" most of the time, esp. because the range count 
is rather large, the object size rather small, and the range sizes are rather 
large as well.



##########
object_store/src/util.rs:
##########
@@ -80,17 +80,47 @@ pub const OBJECT_STORE_COALESCE_DEFAULT: usize = 1024 * 
1024;
 /// less than `coalesce` bytes apart. Out of order `ranges` are not coalesced
 pub async fn coalesce_ranges<F, Fut>(
     ranges: &[std::ops::Range<usize>],
-    mut fetch: F,
+    fetch: F,
     coalesce: usize,
 ) -> Result<Vec<Bytes>>
 where
     F: Send + FnMut(std::ops::Range<usize>) -> Fut,
     Fut: std::future::Future<Output = Result<Bytes>> + Send,
 {
+    let fetch_ranges = merge_ranges(ranges, coalesce);
+
+    let fetched: Vec<_> = futures::stream::iter(fetch_ranges.iter().cloned())
+        .map(fetch)
+        .buffered(10)

Review Comment:
   This should be documented within the method docstring. At least I would find 
this very surprising behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] crepererum commented on a diff in pull request #2563: Improve coalesce_ranges (#2561) (#2562)

Reply via email to