Hmm, I noticed this "The IPC file format doesn't support dictionary
replacements or deltas." I was under the impression we aimed to support
dictionary deltas in the file format.  If not we should remove "Delta
dictionaries are applied in the order they appear in the file footer." from
the specification.

On Thu, Mar 18, 2021 at 8:48 AM Antoine Pitrou <anto...@python.org> wrote:

>
> It's a bit more configurable, but basically yes.  See the IPC write
> options:
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/options.h#L73
>
> Regards
>
> Antoine.
>
>
> Le 18/03/2021 à 16:37, Jacob Quinn a écrit :
> > Ah, interesting. So to make sure I understand correctly, the C++ write
> > implementation will scan all "batches" and unify all dictionary values
> > before writing out the schema + dictionary messages? But only when
> writing
> > the file format? In the streaming case, it would still write
> > replacement/delta dictionary messages as needed.
> >
> > -Jacob
> >
> > On Thu, Mar 18, 2021 at 9:10 AM Neal Richardson <
> neal.p.richard...@gmail.com>
> > wrote:
> >
> >> Somewhat related issue:
> https://issues.apache.org/jira/browse/ARROW-10406
> >>
> >> On Wed, Mar 17, 2021 at 11:22 PM Micah Kornfield <emkornfi...@gmail.com
> >
> >> wrote:
> >>
> >>> BTW, this nuance always felt a little strange to me, but would have
> >>> required adding additional information to the file format, to
> >> disambiguate
> >>> when exactly a dictionary was intended to be replaced.
> >>>
> >>> On Wed, Mar 17, 2021 at 11:19 PM Micah Kornfield <
> emkornfi...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Jacob,
> >>>> There is nuance.  The file format does not support dictionary
> >>> replacement,
> >>>> the specification [1] why that is currently the case.  Only the
> "stream
> >>>> format" supports replacement (i.e. no magic number, only schema
> >> followed
> >>> by
> >>>> one or more dictionary/record-batch messages).
> >>>>
> >>>> -Micah
> >>>>
> >>>> [1]
> https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format
> >>>>
> >>>> On Wed, Mar 17, 2021 at 11:04 PM Jacob Quinn <quinn.jac...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Had an issue come up here:
> >>>>>
> >> https://github.com/JuliaData/Arrow.jl/issues/129#issuecomment-777350450
> >>> .
> >>>>>  From the implementation status page, it says C++ supports
> replacement
> >>>>> dictionaries and that python tracks the C++ implementation. Is this
> >>> just a
> >>>>> pyarrow issue where it specifically doesn't support replacement
> >>>>> dictionaries? Or it's not "hooked in" properly?
> >>>>>
> >>>>> -Jacob
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to