etseidl commented on PR #6313:
URL: https://github.com/apache/arrow-rs/pull/6313#issuecomment-2359994984

   Thanks for taking this on @dsgibbons. I may be confused, but it appears that 
the approach you use for the non-coerced case is to write `INT64` with a `DATE` 
annotation to the Parquet file. The problem is that the Parquet spec does not 
allow this 
([ref](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date)).
 I believe the approach called for in #1938 is to write un-annotated `INT64`, 
and rely on the encoded arrow schema to know how to interpret the column. For 
instance, in `arrow/schema/mod.rs`, perhaps the logic should be more like
   ```rust
           DataType::Date64 => {
               if coerce_types {
                   Type::primitive_type_builder(name, PhysicalType::INT32)
                       .with_logical_type(Some(LogicalType::Date))
                       .with_repetition(repetition)
                       .with_id(id)
                       .build()
               } else {
                   Type::primitive_type_builder(name, PhysicalType::INT64)
                       .with_repetition(repetition)
                       .with_id(id)
                       .build()
               }
           },
   ```
   I don't think the write side is too far off.
   
   On the read side, I think you'll still have to account for the 
`INT32(DATE)->Date64` conversion for the case of coerced data. It seems to me 
like you've removed the code to handle this case, but again I may be confused.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to