mr-brobot opened a new pull request, #8551:
URL: https://github.com/apache/arrow-rs/pull/8551

   # Which issue does this PR close?
   
   - Closes #5550.
   
   # Rationale for this change
   
   Parquet types are a subset of Arrow types, so the Arrow writer must coerce 
to Parquet types. In some cases, this changes the physical representation. 
Therefore, passing Arrow data directly to `Sbbf::check` will produce false 
negatives. Correctness is only guaranteed when checking with the coerced 
Parquet value.
   
   This issue affects some integer and decimal types. It can also affect 
`Date64`.
   
   # What changes are included in this PR?
   
   Introduces `ArrowSbbf` as an Arrow-aware interface to the Parquet `Sbbf`. 
This coerces incoming data if necessary and calls `Sbbf::check`.
   
   Currently, `Date64` types can be written as either `INT32` (days since 
epoch) or `INT64` (milliseconds since epoch), depending on Arrow writer 
properties (`coerce_types`). Instead of requiring additional information to 
handle this special (non-default) case, this implementation requires users to 
coerce `Date64` to `Date32` if the Parquet column type is `INT32`. I'm open to 
feedback on this decision.
   
   # Are these changes tested?
   
   There are tests for integer, float, decimal, and date types. Not exhaustive 
but covering all cases where coercion is necessary.
   
   # Are there any user-facing changes?
   
   Yes, there is a new `ArrowSbbf` struct that most Arrow users should prefer 
over using `Sbbf` directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to