rtpsw commented on PR #14347: URL: https://github.com/apache/arrow/pull/14347#issuecomment-1277658521
> Okay, so here is the problem: users shouldn't pass invalid data to Arrow APIs (except to `Validate` and `ValidateFull`, which are explicitly designed to handle such data). So it doesn't make sense to check for invalid data at the beginning of other functions; also, it can be quite costly (`ValidateFull` can typically be `O(nrows * columns)`). > > (note: "invalid data" here is a badly structured array) This circles back to points we discussed. I can understand the requirement of passing valid data in a correct Arrow app, as well as in correct Arrow code, but less so during its development, where incorrect code frequently occurs. This PR aims to make (failure analysis during) development easier, given that its runtime cost is small. For the purpose of cost, I think the calls to `ValidateFull` in `PrintDiff` shouldn't count because they can be removed - I only give them for reproducibility without a segmentation fault. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
