I'll plan on starting a vote in the next day or two if there are no further objections/comments.
On Sun, Oct 13, 2019 at 11:06 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > I think the only point asked on the PR that I think is worth discussing is > assumptions about dictionaries at the beginning of streams. > > There are two options: > 1. Based on the current wording, it does not seem that all dictionaries > need to be at the beginning of the stream if they aren't made use of in the > first record batch (i.e. a dictionary encoded column is all null in the > first record batch). > 2. We require a dictionary batch for each dictionary at the beginning of > the stream (and require implementations to send an empty batch if they > don't have the dictionary available). > > The current proposal in the PR is option #1. > > Thanks, > Micah > > On Sat, Oct 5, 2019 at 4:01 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> I've opened a pull request [1] to clarify some recent conversations about >> semantics/edge cases for dictionary encoding [2][3] around interleaved >> batches and when isDelta=False. >> >> Specifically, it proposes isDelta=False indicates dictionary >> replacement. For the file format, only one isDelta=False batch is allowed >> per file and isDelta=true batches are applied in the order supplied file >> footer. >> >> In addition, I've added a new enum to DictionaryEncoding to preserve >> future compatibility in case we want to expand dictionary encoding to be an >> explicit mapping from "ID" to "VALUE" as discussed in [4]. >> >> Once people have had a change to review and come to a consensus. I will >> call a formal vote to approve the change commit the change. >> >> Thanks, >> Micah >> >> [1] https://github.com/apache/arrow/pull/5585 >> [2] >> https://lists.apache.org/thread.html/9734b71bc12aca16eb997388e95105bff412fdaefa4e19422f477389@%3Cdev.arrow.apache.org%3E >> [3] >> https://lists.apache.org/thread.html/5c3c9346101df8d758e24664638e8ada0211d310ab756a89cde3786a@%3Cdev.arrow.apache.org%3E >> [4] >> https://lists.apache.org/thread.html/15a4810589b2eb772bce5b2372970d9d93badbd28999a1bbe2af418a@%3Cdev.arrow.apache.org%3E >> >>