crepererum commented on code in PR #2563:
URL: https://github.com/apache/arrow-rs/pull/2563#discussion_r954662682
##########
object_store/src/util.rs:
##########
@@ -180,12 +209,47 @@ mod tests {
assert_eq!(fetches, vec![0..1, 56..75]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
let fetches = do_fetch(vec![0..1, 6..7, 8..9, 10..14, 9..10], 4).await;
assert_eq!(fetches, vec![0..1, 6..14]);
+
+ let mut rand = thread_rng();
+ for _ in 0..100 {
+ let object_len = rand.gen_range(10..250);
+ let range_count = rand.gen_range(0..30);
+ let ranges: Vec<_> = (0..range_count)
+ .map(|_| {
+ let start = rand.gen_range(0..object_len);
+ let end = rand.gen_range(start..object_len);
+ start..end
+ })
+ .collect();
+
+ let coalesce = rand.gen_range(1..5);
Review Comment:
In case you decide against proptest, this should printout the state here
(`ranges` and `coalesce`). Otherwise it will be impossible to debug failures
later.
##########
object_store/src/util.rs:
##########
@@ -171,12 +200,47 @@ mod tests {
assert_eq!(fetches, vec![0..1, 56..75]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
let fetches = do_fetch(vec![0..1, 6..7, 8..9, 10..14, 9..10], 4).await;
assert_eq!(fetches, vec![0..1, 6..14]);
+
+ let mut rand = thread_rng();
Review Comment:
maybe move the fuzz stuff into its own test. I would prefer using a proper
fuzz test framework (like proptest) that handles the error reporting and
minimization as well, but I leave the tradeoff "new large dependency VS less
features" to you.
##########
object_store/src/util.rs:
##########
@@ -180,12 +209,47 @@ mod tests {
assert_eq!(fetches, vec![0..1, 56..75]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
Review Comment:
So was the issue that it wasn't sorting the ranges before merging them?
##########
object_store/src/util.rs:
##########
@@ -180,12 +209,47 @@ mod tests {
assert_eq!(fetches, vec![0..1, 56..75]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
let fetches = do_fetch(vec![0..1, 5..6, 7..9, 2..3, 4..6], 1).await;
- assert_eq!(fetches, vec![0..1, 5..9, 2..6]);
+ assert_eq!(fetches, vec![0..9]);
let fetches = do_fetch(vec![0..1, 6..7, 8..9, 10..14, 9..10], 4).await;
assert_eq!(fetches, vec![0..1, 6..14]);
+
+ let mut rand = thread_rng();
+ for _ in 0..100 {
+ let object_len = rand.gen_range(10..250);
Review Comment:
Haven't tried the test myself, but I'm wondering how likely it is that this
basically ends up as "one range" most of the time, esp. because the range count
is rather large, the object size rather small, and the range sizes are rather
large as well.
##########
object_store/src/util.rs:
##########
@@ -80,17 +80,47 @@ pub const OBJECT_STORE_COALESCE_DEFAULT: usize = 1024 *
1024;
/// less than `coalesce` bytes apart. Out of order `ranges` are not coalesced
pub async fn coalesce_ranges<F, Fut>(
ranges: &[std::ops::Range<usize>],
- mut fetch: F,
+ fetch: F,
coalesce: usize,
) -> Result<Vec<Bytes>>
where
F: Send + FnMut(std::ops::Range<usize>) -> Fut,
Fut: std::future::Future<Output = Result<Bytes>> + Send,
{
+ let fetch_ranges = merge_ranges(ranges, coalesce);
+
+ let fetched: Vec<_> = futures::stream::iter(fetch_ranges.iter().cloned())
+ .map(fetch)
+ .buffered(10)
Review Comment:
This should be documented within the method docstring. At least I would find
this very surprising behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]