jecsand838 commented on code in PR #8349:
URL: https://github.com/apache/arrow-rs/pull/8349#discussion_r2365872073


##########
arrow-avro/src/reader/record.rs:
##########
@@ -486,6 +734,70 @@ impl Decoder {
             Self::Decimal256(_, _, _, builder) => 
builder.append_value(i256::ZERO),
             Self::Enum(indices, _, _) => indices.push(0),
             Self::Duration(builder) => builder.append_null(),
+            Self::Union(fields, type_ids, offsets, encodings, encoding_counts, 
None) => {
+                let mut chosen = None;
+                for (i, ch) in encodings.iter().enumerate() {
+                    if matches!(ch, Decoder::Null(_)) {
+                        chosen = Some(i);
+                        break;
+                    }
+                }
+                let idx = chosen.unwrap_or(0);
+                let type_id = fields
+                    .iter()
+                    .nth(idx)
+                    .map(|(type_id, _)| type_id)
+                    .unwrap_or_else(|| i8::try_from(idx).unwrap_or(0));

Review Comment:
   > Actually, maybe I do understand!
   > 
   > If NULL is one of the union branches, emit that directly
   > Otherwise, default to the first branch (whatever that is), and emit a NULL 
value for it
   > Is that correct?
   > 
   > (but if so, then I don't understand why idx could ever be out of bounds?)
   
   That's correct.
   
   I'm working on cleaning this up now by making the invariants explicit at 
construction time and erroring if the union is empty or if `encodings.len() != 
fields.len()`. Then I'll remove the `i8::try_from(idx)` fallback and only 
compute `type_id` from `fields[idx]`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to