+1 (binding)

On Mon, Jun 29, 2020 at 11:11 PM Ben Kietzman <ben.kietz...@rstudio.com> wrote:
>
> +1 (non binding)
>
> On Mon, Jun 29, 2020, 18:00 Wes McKinney <wesmck...@gmail.com> wrote:
>
> > Hi,
> >
> > As discussed on the mailing list [1], it has been proposed to allow
> > the use of unsigned dictionary indices (which is already technically
> > possible in our metadata serialization, but not allowed according to
> > the language of the columnar specification), with the following
> > caveats:
> >
> > * Unless part of an application's requirements (e.g. if it is
> > necessary to store dictionaries with size 128 to 255 more compactly),
> > implementations are recommended to prefer signed over unsigned
> > integers, with int32 continuing to be the "default" when the indexType
> > field of DictionaryEncoding is null
> > * uint64 dictionary indices, while permitted, are strongly not
> > recommended unless required by an application as they are more
> > difficult to work with in some programming languages (e.g. Java) and
> > they do not offer the storage size benefits that uint8 and uint16 do.
> >
> > This change is backwards compatible, but not forward compatible for
> > all implementations (for example, C++ will reject unsigned integers).
> > Assuming that the V5 MetadataVersion change is accepted, to protect
> > against forward compatibility issues such implementations would be
> > recommended to not allow unsigned dictionary indices to be serialized
> > using V4 MetadataVersion.
> >
> > A PR with the changes to the columnar specification (possibly subject
> > to some clarifying language) is at [2].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept changes to allow unsigned integer dictionary indices
> > [ ] +0
> > [ ] -1 Do not accept because...
> >
> > [1]:
> > https://lists.apache.org/thread.html/r746e0a76c4737a2cf48dec656103677169bebb303240e62ae1c66d35%40%3Cdev.arrow.apache.org%3E
> > [2]: https://github.com/apache/arrow/pull/7567
> >

Reply via email to