joellubi opened a new issue, #39456:
URL: https://github.com/apache/arrow/issues/39456

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The Parquet 
[DATE](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date)
 logical type must annotate an `int32` representing days since the UNIX epoc 
per the spec. The Arrow DATE64 (ms since UNIX epoch) type does not have a 
direct analog in Parquet, so it must be coerced into a compatible 
representation when writing Arrow data to Parquet.
   
   The prevailing convention is to coerce DATE64 to `int32` seconds since the 
UNIX epoch (Parquet DATE logical type) [e.g. 
[C++](https://github.com/apache/arrow/blob/ccc674c56f3473c9556a5af96dff9d156f559663/cpp/src/parquet/arrow/schema.cc#L372-L375),
 
[Rust](https://github.com/apache/arrow-rs/blob/2f383e764aa2b79e52d562e24eb0d1dce41f5ce7/parquet/src/arrow/schema/mod.rs#L425-L429)].
 The behavior for handling an `int64` value not on a date boundary (i.e. not 
divisible by 86400000) is not defined. Some implementations 
[validate](https://github.com/apache/arrow/blob/bda727f9fe56e0abd4fa2770d7175c9074306573/cpp/src/arrow/array/validate.cc#L172-L190)
 this condition while others truncate to the date the physical value falls 
within.
   
   The current Go implementation diverges from the approach followed by these 
languages, coercing instead to a [UTC-normalized 
TIMESTAMP[ms]](https://github.com/apache/arrow/blob/ccc674c56f3473c9556a5af96dff9d156f559663/go/parquet/pqarrow/schema.go#L328-L330).
 This may lead to surprising behavior in cross-language use-cases and alters 
the original semantics of the type (at least for non-arrow consumers that don't 
handle 
[store_schema](https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/parquet/pqarrow#WithStoreSchema)).
 It seems that it would increase overall compatibility in the ecosystem to 
align Go to the convention currently followed in the other implementations.
   
   See also: 
[https://lists.apache.org/thread/q036r1q3cw5ysn3zkpvljx3s9ho18419](https://lists.apache.org/thread/q036r1q3cw5ysn3zkpvljx3s9ho18419)
   
   ### Component(s)
   
   Go, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to