jhorstmann commented on issue #1313:
URL: https://github.com/apache/arrow-rs/issues/1313#issuecomment-1065844808


   `ArrayDataBuilder::build` calls `ArrayData::try_new`, which does a full 
validation of that struct, including all child ArrayData objects. For example 
building a DictionaryArray from Int32 keys to String values first validates 
that the child data containing these strings is valid. Since layout of a 
StringArray consists of an offset buffer and a data buffer, it has to check 
that the offsets are monotonically increasing and in bounds for the data 
buffer. When the child StringArray is valid then also the dictionary keys need 
to be validated to be in bounds of this StringArray.
   
   In `DictionaryArray::try_new`, we get the child array as a parameter and can 
assume that is is valid without having to validate it again. We only need to 
check that the given keys are in bounds.
   
   I think a possible solution would be to extract the dictionary validation 
logic out of `ArrayData::validate_full` into a separate function. 
`DictionaryArray::try_new` could then use `ArrayDataBuilder::build_unchecked` 
and afterwards call the new function which only validates that the keys are in 
bounds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to