To clarify, we have already implemented option #1 ("It is not required that all
dictionary batches occur at the beginning") in the previous PR[1].
So hope this proposal will be accepted and I would like to take follow-up works
in Java side if possible.
Thanks,
Ji Liu
[1] https://github.com/apache/arrow/pull/4960
------------------------------------------------------------------
From:Ji Liu <[email protected]>
Send Time:2019年11月26日(星期二) 14:04
To:dev <[email protected]>; Micah Kornfield <[email protected]>
Cc:Wes McKinney <[email protected]>
Subject:Re: [VOTE] Clarifications and forward compatibility changes for
Dictionary Encoding (second iteration)
+1 (non-binding)
Thanks
Ji Liu
------------------------------------------------------------------
From:Fan Liya <[email protected]>
Send Time:2019年11月26日(星期二) 14:01
To:dev <[email protected]>; Micah Kornfield <[email protected]>
Cc:Wes McKinney <[email protected]>
Subject:Re: [VOTE] Clarifications and forward compatibility changes for
Dictionary Encoding (second iteration)
I am sorry I did not follow the thread closely (will follow up later).
However, the proposal above looks good to me.
So I am +0.5 for this.
Best,
Liya Fan
On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield <[email protected]>
wrote:
> Could other members of the community chime in on this? In particular
> getting views from other language maintainers would be good.
>
> Thanks,
> Micah
>
> On Thu, Nov 21, 2019 at 12:23 PM Micah Kornfield <[email protected]>
> wrote:
>
> > Forgot to say, My vote is +1 (binding).
> >
> > On Thu, Nov 21, 2019 at 12:09 PM Wes McKinney <[email protected]>
> wrote:
> >
> >> +1 (binding). Thanks Micah
> >>
> >> On Wed, Nov 20, 2019 at 10:42 PM Micah Kornfield <[email protected]
> >
> >> wrote:
> >> >
> >> > Hello,
> >> > As discussed on [1], I've proposed clarifications in a PR [2] that
> >> > clarifies:
> >> >
> >> > 1. It is not required that all dictionary batches occur at the
> >> beginning
> >> > of the IPC stream format (if a the first record batch has an all null
> >> > dictionary encoded column, the null column's dictionary might not be
> >> sent
> >> > until later in the stream).
> >> >
> >> > 2. A second dictionary batch for the same ID that is not a "delta
> >> batch"
> >> > in an IPC stream indicates the dictionary should be replaced.
> >> >
> >> > 3. Clarifies that the file format, can only contain 1 "NON-delta"
> >> > dictionary batch and multiple "delta" dictionary batches. Dictionary
> >> > replacement is not supported in the file format.
> >> >
> >> > 4. Add an enum to dictionary metadata for possible future changes in
> >> what
> >> > format dictionary batches can be sent. (the most likely would be an
> >> array
> >> > Map<Int, Value>). An enum is needed as a place holder to allow for
> >> forward
> >> > compatibility past the release 1.0.0.
> >> >
> >> > If accepted there will be work in all implementations to make sure
> that
> >> > they cover the edge cases highlighted and additional integration
> testing
> >> > will be needed.
> >> >
> >> > Please vote whether to accept these additions. The vote will be open
> >> for at
> >> > least 72 hours.
> >> >
> >> > [ ] +1 Accept these change to the specification
> >> > [ ] +0
> >> > [ ] -1 Do not accept the changes because...
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> >
> >> > [1]
> >> >
> >>
> https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
> >> > [2] https://github.com/apache/arrow/pull/5585
> >>
> >
>