scovich commented on issue #8609:
URL: https://github.com/apache/arrow-rs/issues/8609#issuecomment-3403173750

   I think you've identified two separate issues:
   
   > Probing a `VariantArray` when that element is null will result in a panic
   
   This is arguably a bug that needs fixing, and the immediate fix is likely a 
one-liner:
   
   <details>
   
   ```diff
       pub fn value(&self, index: usize) -> Variant<'_, '_> {
           match (self.typed_value_field(), self.value_field()) {
               // Always prefer typed_value, if available                       
                                                                                
                                                                                
                                  
               (Some(typed_value), value) if typed_value.is_valid(index) => {
                   typed_value_to_variant(typed_value, value, index)
               }
               // Otherwise fall back to value, if available                    
                                                                                
                                                                                
                                  
   -           (_, Some(value)) if value.is_valid(index) => {
   +           (_, Some(value)) if self.metadata.is_valid(index) && 
value.is_valid(index) => {
                   Variant::new(self.metadata.value(index), value.value(index))
               }
               // It is technically invalid for neither value nor typed_value 
fields to be available,                                                         
                                                                                
                                    
               // but the spec specifically requires readers to return 
Variant::Null in this case.                                                     
                                                                                
                                           
               _ => Variant::Null,
           }
   ```
   
   </details>
   
   ... but only because `VariantArray::typed_value_to_variant` doesn't support 
partially shredded objects yet. 
   
   <details>
   
   Once we support those, that helper will need to take the `metadata` column 
as an additional arg, and the lifetime annotations will likely need to change 
as well. Actually... I think the method _cannot_ support partially shredded 
objects because `Variant` is fundamentally a reference type, and returning a 
single `Variant` for a partially shredded object would require reassembling it. 
Or adding the new `Variant::ShreddedObject` enum variant that @alamb has 
mentioned a couple of times as a way to capture all the raw bytes as-is, 
without trying to reassemble them.
   
   If we always unshred all primitive types, then it _might_ be as simple as 
adding new enum variants `ShreddedObject(ShreddedVariantObject)` and 
`ShreddedList(Vec<Variant<'m, 'v>)` where:
   ```rust
   struct ShreddedVariantObject<'m, 'v> {
       unshredded_fields: Option<VariantObject<'m, 'v>>,
       shredded_fields: Vec<(&'m str, Variant<'m, 'v>)>,
   }
   ```
   
   The hard part is _populating_ it, not defining it...
   
   </details>
   
   Meanwhile, there are some other panics lurking in that method, which we 
should probably track fixes for as well?
   
   > This seems like an odd pattern, as it's not clear exactly what a user 
should do when iterating or probing through a `VariantArray`
   
   For any arrow array, a caller who cares about nulls needs to check 
`Array::is_null` or `Array::is_valid` before invoking the `value` method. A 
blind `v.value(0)` is never recommended (but it still shouldn't panic).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to