alamb commented on code in PR #7351:
URL: https://github.com/apache/arrow-rs/pull/7351#discussion_r2018943675
##########
arrow/benches/comparison_kernels.rs:
##########
@@ -424,5 +436,127 @@ fn add_benchmark(c: &mut Criterion) {
});
}
-criterion_group!(benches, add_benchmark);
+// Generate a string array that meets the requirements:
+// All strings start with "test", followed by a tail, ensuring the total
length is greater than 12 bytes.
+fn make_custom_string_array(size: usize, tail_len: usize, rng: &mut StdRng) ->
Vec<Option<String>> {
+ (0..size)
+ .map(|_| {
+ // Generate the tail: use visible ASCII characters (32 to 126) to
ensure a valid UTF-8 string.
+ let tail: String = (0..tail_len)
Review Comment:
It probably doesn't matter in this case, but you can avoid at least one
allocation like this:
```rust
// Generate the tail: use visible ASCII characters (32 to 126)
to ensure a valid UTF-8 string.
let s: String = "test".chars()
.chain((0..tail_len).map(|_| rng.random_range(32u8..127u8)
as char))
.collect();
Some(s)
```
##########
arrow/benches/comparison_kernels.rs:
##########
@@ -424,5 +436,127 @@ fn add_benchmark(c: &mut Criterion) {
});
}
-criterion_group!(benches, add_benchmark);
+// Generate a string array that meets the requirements:
+// All strings start with "test", followed by a tail, ensuring the total
length is greater than 12 bytes.
+fn make_custom_string_array(size: usize, tail_len: usize, rng: &mut StdRng) ->
Vec<Option<String>> {
+ (0..size)
+ .map(|_| {
+ // Generate the tail: use visible ASCII characters (32 to 126) to
ensure a valid UTF-8 string.
+ let tail: String = (0..tail_len)
+ .map(|_| rng.random_range(32u8..127u8) as char)
+ .collect();
+ Some(format!("test{}", tail))
+ })
+ .collect()
+}
+
+fn add_custom_string_benchmarks(c: &mut Criterion) {
Review Comment:
I found "custom" somewhat vague. Perhaps we could use the term `long`
instead
```rust
let long_strings = make_long_string_array(SIZE, 12, &mut rng);
let long_string_array = StringArray::from(long_strings);
let long_string_view =
StringViewArray::from_iter(long_string_array.iter());
```
##########
arrow/benches/comparison_kernels.rs:
##########
@@ -424,5 +436,127 @@ fn add_benchmark(c: &mut Criterion) {
});
}
-criterion_group!(benches, add_benchmark);
+// Generate a string array that meets the requirements:
+// All strings start with "test", followed by a tail, ensuring the total
length is greater than 12 bytes.
+fn make_custom_string_array(size: usize, tail_len: usize, rng: &mut StdRng) ->
Vec<Option<String>> {
Review Comment:
Maybe we could call this `long_strings` or something as it doesn't really
make a string array
##########
arrow/benches/comparison_kernels.rs:
##########
@@ -424,5 +436,127 @@ fn add_benchmark(c: &mut Criterion) {
});
}
-criterion_group!(benches, add_benchmark);
+// Generate a string array that meets the requirements:
+// All strings start with "test", followed by a tail, ensuring the total
length is greater than 12 bytes.
+fn make_custom_string_array(size: usize, tail_len: usize, rng: &mut StdRng) ->
Vec<Option<String>> {
+ (0..size)
+ .map(|_| {
+ // Generate the tail: use visible ASCII characters (32 to 126) to
ensure a valid UTF-8 string.
+ let tail: String = (0..tail_len)
+ .map(|_| rng.random_range(32u8..127u8) as char)
+ .collect();
+ Some(format!("test{}", tail))
+ })
+ .collect()
+}
+
+fn add_custom_string_benchmarks(c: &mut Criterion) {
+ // Assume SIZE is defined as the benchmark array size.
+ let mut rng = seedable_rng();
+ // Here, tail length is set to 12, so the total length becomes 4 + 12 = 16
bytes (> 12 bytes).
+ let custom_strings = make_custom_string_array(SIZE, 12, &mut rng);
+ let custom_string_array = StringArray::from(custom_strings);
+ let custom_string_view =
StringViewArray::from_iter(custom_string_array.iter());
+
+ // Benchmark the eq operation for utf8 (StringArray)
+ c.bench_function("eq custom utf8", |b| {
Review Comment:
Can we use a name that includes the types of arguments as well as the
operator?
Maybe something like
```suggestion
c.bench_function("eq utf8 utf8view long", |b| {
```
And then similarly below
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]