+1 (non binding)

On Mon, Jun 29, 2020, 18:00 Wes McKinney <wesmck...@gmail.com> wrote:

> Hi,
>
> As discussed on the mailing list [1], it has been proposed to allow
> the use of unsigned dictionary indices (which is already technically
> possible in our metadata serialization, but not allowed according to
> the language of the columnar specification), with the following
> caveats:
>
> * Unless part of an application's requirements (e.g. if it is
> necessary to store dictionaries with size 128 to 255 more compactly),
> implementations are recommended to prefer signed over unsigned
> integers, with int32 continuing to be the "default" when the indexType
> field of DictionaryEncoding is null
> * uint64 dictionary indices, while permitted, are strongly not
> recommended unless required by an application as they are more
> difficult to work with in some programming languages (e.g. Java) and
> they do not offer the storage size benefits that uint8 and uint16 do.
>
> This change is backwards compatible, but not forward compatible for
> all implementations (for example, C++ will reject unsigned integers).
> Assuming that the V5 MetadataVersion change is accepted, to protect
> against forward compatibility issues such implementations would be
> recommended to not allow unsigned dictionary indices to be serialized
> using V4 MetadataVersion.
>
> A PR with the changes to the columnar specification (possibly subject
> to some clarifying language) is at [2].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept changes to allow unsigned integer dictionary indices
> [ ] +0
> [ ] -1 Do not accept because...
>
> [1]:
> https://lists.apache.org/thread.html/r746e0a76c4737a2cf48dec656103677169bebb303240e62ae1c66d35%40%3Cdev.arrow.apache.org%3E
> [2]: https://github.com/apache/arrow/pull/7567
>

Reply via email to