Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

Micah Kornfield Sun, 13 Oct 2019 11:07:00 -0700

I think the only point asked on the PR that I think is worth discussing is
assumptions about dictionaries at the beginning of streams.

There are two options:
1.  Based on the current wording, it does not seem that all dictionaries
need to be at the beginning of the stream if they aren't made use of in the
first record batch (i.e. a dictionary encoded column is all null in the
first record batch).
2.  We require a dictionary batch for each dictionary at the beginning of
the stream (and require implementations to send an empty batch if they
don't have the dictionary available).

The current proposal in the PR is option #1.

Thanks,
Micah

On Sat, Oct 5, 2019 at 4:01 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I've opened a pull request [1] to clarify some recent conversations about
> semantics/edge cases for dictionary encoding [2][3] around interleaved
> batches and when isDelta=False.
>
> Specifically, it proposes isDelta=False indicates dictionary replacement.
> For the file format, only one isDelta=False batch is allowed per file and
> isDelta=true batches are applied in the order supplied file footer.
>
> In addition, I've added a new enum to DictionaryEncoding to preserve
> future compatibility in case we want to expand dictionary encoding to be an
> explicit mapping from "ID" to "VALUE" as discussed in [4].
>
> Once people have had a change to review and come to a consensus. I will
> call a formal vote to approve the change commit the change.
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/arrow/pull/5585
> [2]
> https://lists.apache.org/thread.html/9734b71bc12aca16eb997388e95105bff412fdaefa4e19422f477389@%3Cdev.arrow.apache.org%3E
> [3]
> https://lists.apache.org/thread.html/5c3c9346101df8d758e24664638e8ada0211d310ab756a89cde3786a@%3Cdev.arrow.apache.org%3E
> [4]
> https://lists.apache.org/thread.html/15a4810589b2eb772bce5b2372970d9d93badbd28999a1bbe2af418a@%3Cdev.arrow.apache.org%3E
>
>

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

Reply via email to