jecsand838 commented on code in PR #8254:
URL: https://github.com/apache/arrow-rs/pull/8254#discussion_r2312110172


##########
arrow-avro/src/reader/record.rs:
##########
@@ -590,10 +590,23 @@ impl Decoder {
                         )));
                     }
                 }
+                // Extract the value field nullability from the schema
+                let is_value_nullable = match map_field.data_type() {
+                    DataType::Struct(fields) => fields
+                        .iter()
+                        .find(|f| f.name() == "value")
+                        .map(|f| f.is_nullable())
+                        .unwrap_or(false),
+                    _ => true, // default to nullable
+                };
                 let entries_struct = StructArray::new(
                     Fields::from(vec![
                         Arc::new(ArrowField::new("key", DataType::Utf8, 
false)),
-                        Arc::new(ArrowField::new("value", 
val_arr.data_type().clone(), true)),
+                        Arc::new(ArrowField::new(
+                            "value",
+                            val_arr.data_type().clone(),
+                            is_value_nullable,

Review Comment:
   I took a slightly different approach in #8220 that avoids the field scan and 
new Field allocation while preserving the existing schema metadata. 
   
   It's slightly more performant, especially on smaller batches:
   
   ```
   Map/100                 time:   [6.2237 µs 6.2594 µs 6.2899 µs]
                           thrpt:  [507.32 MiB/s 509.79 MiB/s 512.71 MiB/s]
                    change:
                           time:   [−0.3038% +0.5799% +1.4930%] (p = 0.19 > 
0.05)
                           thrpt:  [−1.4710% −0.5766% +0.3047%]
                           No change in performance detected.
   Map/10000               time:   [250.40 µs 253.75 µs 258.62 µs]
                           thrpt:  [1.2573 GiB/s 1.2814 GiB/s 1.2986 GiB/s]
                    change:
                           time:   [−2.5344% −1.1670% +0.1631%] (p = 0.08 > 
0.05)
                           thrpt:  [−0.1628% +1.1808% +2.6003%]
                           No change in performance detected.
   Found 6 outliers among 25 measurements (24.00%)
     6 (24.00%) low mild
   Map/1000000             time:   [252.99 µs 255.93 µs 260.24 µs]
                           thrpt:  [130.21 GiB/s 132.40 GiB/s 133.94 GiB/s]
                    change:
                           time:   [−1.9418% −0.4373% +1.0680%] (p = 0.60 > 
0.05)
                           thrpt:  [−1.0568% +0.4393% +1.9803%]
                           No change in performance detected.
   ```
   
   vs
   
   ```
   Map/100                 time:   [6.4487 µs 6.4584 µs 6.4687 µs]
                           thrpt:  [493.30 MiB/s 494.09 MiB/s 494.83 MiB/s]
                    change:
                           time:   [+4.2911% +5.0113% +5.7191%] (p = 0.00 < 
0.05)
                           thrpt:  [−5.4097% −4.7721% −4.1145%]
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   Map/10000               time:   [258.90 µs 260.78 µs 263.17 µs]
                           thrpt:  [1.2356 GiB/s 1.2469 GiB/s 1.2559 GiB/s]
                    change:
                           time:   [−0.6514% +0.8318% +2.5189%] (p = 0.34 > 
0.05)
                           thrpt:  [−2.4570% −0.8249% +0.6557%]
                           No change in performance detected.
   Found 1 outliers among 25 measurements (4.00%)
     1 (4.00%) high severe
   Map/1000000             time:   [265.48 µs 268.25 µs 270.56 µs]
                           thrpt:  [125.24 GiB/s 126.33 GiB/s 127.64 GiB/s]
                    change:
                           time:   [+3.0081% +4.1202% +5.2124%] (p = 0.00 < 
0.05)
                           thrpt:  [−4.9542% −3.9572% −2.9203%]
                           Performance has regressed.
   Found 1 outliers among 10 measurements (10.00%)
   ```
   
   If you wanted to check it out: 
https://github.com/apache/arrow-rs/blob/ebf402915511308201ef5fbb92368d696ee50ff5/arrow-avro/src/reader/record.rs#L685
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to