jhorstmann commented on issue #1313: URL: https://github.com/apache/arrow-rs/issues/1313#issuecomment-1065844808
`ArrayDataBuilder::build` calls `ArrayData::try_new`, which does a full validation of that struct, including all child ArrayData objects. For example building a DictionaryArray from Int32 keys to String values first validates that the child data containing these strings is valid. Since layout of a StringArray consists of an offset buffer and a data buffer, it has to check that the offsets are monotonically increasing and in bounds for the data buffer. When the child StringArray is valid then also the dictionary keys need to be validated to be in bounds of this StringArray. In `DictionaryArray::try_new`, we get the child array as a parameter and can assume that is is valid without having to validate it again. We only need to check that the given keys are in bounds. I think a possible solution would be to extract the dictionary validation logic out of `ArrayData::validate_full` into a separate function. `DictionaryArray::try_new` could then use `ArrayDataBuilder::build_unchecked` and afterwards call the new function which only validates that the keys are in bounds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
