Thank you everyone for contributing to this discussion. I'd like to summarize where I think we've landed at this point: - After considering pros/cons of first-class vs canonical extension type and historical precedent, adopting Bool8 as a canonical extension type seems reasonable for this proposal. - There was some discussion about "true == 1" vs "true != 0" semantics. The conclusion is that all systems must interpret any nonzero value as true for interoperability, but 1 is preferred when producing/casting Bool8 if implementations are deciding on a canonical value.
Additionally the format change [1] and Go implementation [2] have been split into separate PRs as requested by several reviewers. Please share any additional comments or anything I may have missed. If this all seems reasonable, I will move forward with an additional implementation in C++ and open this to a formal vote. Thanks, Joel [1]: https://github.com/apache/arrow/pull/43234 [2]: https://github.com/apache/arrow/pull/43323 On Mon, Jul 22, 2024 at 5:59 PM Wes McKinney <wesmck...@gmail.com> wrote: > From a historical perspective, if we had had extension types / canonical > extension types, it would have made more sense to have the millisecond > dates as an extension type. > > The goal of having the extra type was to avoid an unnecessary serialization > in systems where there is a benefit to moving data efficiently over the > wire, and here it is the same — to be able to move 8-bit boolean data > without serialization from process to process in a reasonably standardized > way. > > Because boolean data is used much more than date data (in general), it > seems like it would be more burdensome for implementations if a 8-bit > boolean type were promoted to equal status with the 1-bit type. > > On Mon, Jul 22, 2024 at 2:33 PM Antoine Pitrou <anto...@python.org> wrote: > > > > > Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit : > > > > > > If Canonical Extensions had existed at the time, I think there's a > chance > > > we may have ended up with int32 Date as a first class type and int64 > > > MillisecondDate as a Canonical Extension type. > > > > Agreed. > > > > > Are there any lessons we've > > > learned from implementing both as first-class types as opposed to this > > > hypothetical first-class / extension split? > > > > In Arrow C++, not many lessons I'd say, because those date types don't > > support many operations. > > > > Regards > > > > Antoine. > > >