mr-brobot opened a new pull request, #8551: URL: https://github.com/apache/arrow-rs/pull/8551
# Which issue does this PR close? - Closes #5550. # Rationale for this change Parquet types are a subset of Arrow types, so the Arrow writer must coerce to Parquet types. In some cases, this changes the physical representation. Therefore, passing Arrow data directly to `Sbbf::check` will produce false negatives. Correctness is only guaranteed when checking with the coerced Parquet value. This issue affects some integer and decimal types. It can also affect `Date64`. # What changes are included in this PR? Introduces `ArrowSbbf` as an Arrow-aware interface to the Parquet `Sbbf`. This coerces incoming data if necessary and calls `Sbbf::check`. Currently, `Date64` types can be written as either `INT32` (days since epoch) or `INT64` (milliseconds since epoch), depending on Arrow writer properties (`coerce_types`). Instead of requiring additional information to handle this special (non-default) case, this implementation requires users to coerce `Date64` to `Date32` if the Parquet column type is `INT32`. I'm open to feedback on this decision. # Are these changes tested? There are tests for integer, float, decimal, and date types. Not exhaustive but covering all cases where coercion is necessary. # Are there any user-facing changes? Yes, there is a new `ArrowSbbf` struct that most Arrow users should prefer over using `Sbbf` directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
