Re: [PR] Add benchmark for `infer_json_schema` [arrow-rs]

via GitHub Fri, 13 Mar 2026 02:55:37 -0700


alamb commented on code in PR #9546:
URL: https://github.com/apache/arrow-rs/pull/9546#discussion_r2930170348



##########
arrow-json/benches/json_reader.rs:
##########
@@ -323,13 +325,83 @@ fn bench_serialize_list(c: &mut Criterion) {
     });
 }
 
+fn bench_schema_inference(c: &mut Criterion) {
+    const ROWS: usize = 1000;
+
+    #[derive(Serialize, Arbitrary, Debug)]
+    struct Row {
+        a: Option<i16>,
+        b: Option<String>,
+        c: Option<[i16; 8]>,
+        d: Option<[bool; 8]>,
+        e: Option<Inner>,
+        f: f64,
+    }
+
+    #[derive(Serialize, Arbitrary, Debug)]
+    struct Inner {
+        a: Option<i16>,
+        b: Option<String>,
+        c: Option<bool>,
+    }
+
+    let mut data = vec![];
+    for row in pseudorandom_sequence::<Row>(ROWS) {

Review Comment:
   I think other benchmarks we have use seedable_rng to get repeatable psuedo 
random numbers. Is there a reason we shouldn't follow the same pattern here?



##########
arrow-json/Cargo.toml:
##########
@@ -61,6 +61,7 @@ tokio = { version = "1.27", default-features = false, 
features = ["io-util"] }
 bytes = "1.4"
 criterion = { workspace = true, default-features = false }
 rand = { version = "0.9", default-features = false, features = ["std", 
"std_rng", "thread_rng"] }
+arbitrary = { version = "1.4.2", features = ["derive"] }

Review Comment:
   I would prefer not to add a new dependency (even a dev one) unless really 
necessary as that is then one more  thing to chase down / maintain. I think you 
could get the same effect using a random number generator directly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add benchmark for `infer_json_schema` [arrow-rs]

Reply via email to