Thank you everyone for contributing to this discussion.

I'd like to summarize where I think we've landed at this point:
- After considering pros/cons of first-class vs canonical extension type
and historical precedent, adopting Bool8 as a canonical extension type
seems reasonable for this proposal.
- There was some discussion about "true == 1" vs "true != 0" semantics. The
conclusion is that all systems must interpret any nonzero value as true for
interoperability, but 1 is preferred when producing/casting Bool8 if
implementations are deciding on a canonical value.

Additionally the format change [1] and Go implementation [2] have been
split into separate PRs as requested by several reviewers.

Please share any additional comments or anything I may have missed. If this
all seems reasonable, I will move forward with an additional implementation
in C++ and open this to a formal vote.

Thanks,
Joel


[1]: https://github.com/apache/arrow/pull/43234
[2]: https://github.com/apache/arrow/pull/43323

On Mon, Jul 22, 2024 at 5:59 PM Wes McKinney <wesmck...@gmail.com> wrote:

> From a historical perspective, if we had had extension types / canonical
> extension types, it would have made more sense to have the millisecond
> dates as an extension type.
>
> The goal of having the extra type was to avoid an unnecessary serialization
> in systems where there is a benefit to moving data efficiently over the
> wire, and here it is the same — to be able to move 8-bit boolean data
> without serialization from process to process in a reasonably standardized
> way.
>
> Because boolean data is used much more than date data (in general), it
> seems like it would be more burdensome for implementations if a 8-bit
> boolean type were promoted to equal status with the 1-bit type.
>
> On Mon, Jul 22, 2024 at 2:33 PM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit :
> > >
> > > If Canonical Extensions had existed at the time, I think there's a
> chance
> > > we may have ended up with int32 Date as a first class type and int64
> > > MillisecondDate as a Canonical Extension type.
> >
> > Agreed.
> >
> > > Are there any lessons we've
> > > learned from implementing both as first-class types as opposed to this
> > > hypothetical first-class / extension split?
> >
> > In Arrow C++, not many lessons I'd say, because those date types don't
> > support many operations.
> >
> > Regards
> >
> > Antoine.
> >
>

Reply via email to