dsgibbons opened a new pull request, #6313: URL: https://github.com/apache/arrow-rs/pull/6313
# Which issue does this PR close? Relates to #1938, which will be closed by future PRs. # What changes are included in this PR? This PR introduces the `coerce_types` flag for `WriterProperties`. To start, I've only addressed the `Date64` case. The desired behaviour for `Date64` is captured in #1938. I've added tests to ensure that `Date64` is handled correctly when `coerce_types=false` and `coerce_types=true`. I've also added some testing around `Date32` to ensure I haven't accidentally broken anything there. I've deliberately avoided the other types mentioned in #1938 because I wanted to make sure we are happy with how `coerce_types` works for `Date64` first. Once accepted, I'll raise PRs for the remaining types to finally close out #1938. One thing missing from this PR is validation on `Date64`. The [C++ implementation](https://github.com/apache/arrow/blob/bda727f9fe56e0abd4fa2770d7175c9074306573/cpp/src/arrow/array/validate.cc#L172-L190) has a `full_validation` option that checks that all `Date64` lie on a date boundary (i.e., a multiple of 1000 * 60 * 60 * 24). I've started work on adding this in arrow-rs and intend to raise a separate PR for enabling `Date64` validation in the `arrow-array` crate. # Are there any user-facing changes? Breaking changes: - `arrow_to_parquet_schema(schema: &Schema, coerce_types: bool)` - I wasn't sure how else we could propagate the `coerce_types` option down to the parquet schema. - `arrow_to_parquet_schema_with_root(schema: &Schema, root: &str, coerce_types: bool)` - same as above. - `coerce_types` flag in `WriterProperties` (not a major issue as typically created via `WriterPropertiesBuilder`. User facing changes: - The ability to read/write parquet files with the native `Date64` logical type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
