dsgibbons opened a new pull request, #6313:
URL: https://github.com/apache/arrow-rs/pull/6313

   # Which issue does this PR close?
   
   Relates to #1938, which will be closed by future PRs.
   
   # What changes are included in this PR?
   
   This PR introduces the `coerce_types` flag for `WriterProperties`. To start, 
I've only addressed the `Date64` case. The desired behaviour for `Date64` is 
captured in #1938. I've added tests to ensure that `Date64` is handled 
correctly when `coerce_types=false` and `coerce_types=true`. I've also added 
some testing around `Date32` to ensure I haven't accidentally broken anything 
there.
   
   I've deliberately avoided the other types mentioned in #1938 because I 
wanted to make sure we are happy with how `coerce_types` works for `Date64` 
first. Once accepted, I'll raise PRs for the remaining types to finally close 
out #1938.
   
   One thing missing from this PR is validation on `Date64`. The [C++ 
implementation](https://github.com/apache/arrow/blob/bda727f9fe56e0abd4fa2770d7175c9074306573/cpp/src/arrow/array/validate.cc#L172-L190)
 has a `full_validation` option that checks that all `Date64` lie on a date 
boundary (i.e., a multiple of 1000 * 60 * 60 * 24). I've started work on adding 
this in arrow-rs and intend to raise a separate PR for enabling `Date64` 
validation in the `arrow-array` crate. 
   
   # Are there any user-facing changes?
   
   Breaking changes:
   - `arrow_to_parquet_schema(schema: &Schema, coerce_types: bool)` - I wasn't 
sure how else we could propagate the `coerce_types` option down to the parquet 
schema.
   - `arrow_to_parquet_schema_with_root(schema: &Schema, root: &str, 
coerce_types: bool)` - same as above.
   - `coerce_types` flag in `WriterProperties` (not a major issue as typically 
created via `WriterPropertiesBuilder`.
   
   User facing changes:
   - The ability to read/write parquet files with the native `Date64` logical 
type.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to