tustvold opened a new issue, #2387: URL: https://github.com/apache/arrow-rs/issues/2387
**Which part is this question about** Generally the approach taken by this crate is that a given `ArrayData` and by extension `Array` only contains valid data. For example, a StringArray is valid UTF-8 with each index at a codepoint boundary, a dictionary array only has valid indexes, etc... This allows eliding bound checks on access within kernels. However, in order for this to be sound, it must be impossible to create invalid `ArrayData` using safe APIs. This means that safe APIs must either: * Generate valid data by construction - e.g. the builder APIs * Validate data - e.g. `ArrayData::try_new` For the examples above incorrect validation can very clearly lead to UB. The situation for decimal values is a bit more confused, in particular I'm not really clear on what the implications of a value that exceeds the precision actually are. However, some notes: * As far as I can tell we don't protect against overflow of normal integer types * We don't have any decimal arithmetic kernels (yet) * The decimal types are fixed bit width and so the precision isn't used to impact their representation **Describe your question** My question boils down to: * What is the purpose of the precision argument? Is it just for interoperability with other representations? * Is there a requirement to saturate/error at the bounds of the precision, or can we simply overflow/saturate at the bounds of the underlying representation * Does validating the precision on ingest to ArrayData actually elide any validation when performing computation? The answers to this will dictate if we can just take a relaxed attitude to precision, and let users opt into validation if they care, and otherwise simply ignore it. I tried to understand what the C++ implementation is doing, but I honestly got lost. It almost looks like it is performing floating point operations and then rounding them back, which seems surprising... **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
