tustvold opened a new issue, #2387:
URL: https://github.com/apache/arrow-rs/issues/2387

   **Which part is this question about**
   
   Generally the approach taken by this crate is that a given `ArrayData` and 
by extension `Array` only contains valid data. For example, a StringArray is 
valid UTF-8 with each index at a codepoint boundary, a dictionary array only 
has valid indexes, etc... This allows eliding bound checks on access within 
kernels. 
   
   However, in order for this to be sound, it must be impossible to create 
invalid `ArrayData` using safe APIs. This means that safe APIs must either:
   
   * Generate valid data by construction - e.g. the builder APIs
   * Validate data - e.g. `ArrayData::try_new`
   
   For the examples above incorrect validation can very clearly lead to UB. The 
situation for decimal values is a bit more confused, in particular I'm not 
really clear on what the implications of a value that exceeds the precision 
actually are. However, some notes:
   
   * As far as I can tell we don't protect against overflow of normal integer 
types
   * We don't have any decimal arithmetic kernels (yet)
   * The decimal types are fixed bit width and so the precision isn't used to 
impact their representation
   
   **Describe your question**
   
   My question boils down to:
   
   * What is the purpose of the precision argument? Is it just for 
interoperability with other representations?
   * Is there a requirement to saturate/error at the bounds of the precision, 
or can we simply overflow/saturate at the bounds of the underlying 
representation
   * Does validating the precision on ingest to ArrayData actually elide any 
validation when performing computation?
   
   The answers to this will dictate if we can just take a relaxed attitude to 
precision, and let users opt into validation if they care, and otherwise simply 
ignore it.
   
   I tried to understand what the C++ implementation is doing, but I honestly 
got lost. It almost looks like it is performing floating point operations and 
then rounding them back, which seems surprising...
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to