Hi Micah,

The use cases I'm aware of are mostly coming from proprietary applications.
My idea was for the extension metadata to be as unobtrusive as possible.
The only alternative as I see it would be to have an Extension value in the
Type union which would be more intrusive to applications handling data for
which they have no special handling. That doesn't seem desirable if there
are alternatives.

As an immediate use case we could use extension types to embed Tensor
values in Binary arrays.

Wes

On Sat, May 18, 2019, 12:19 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Wes,
> This approach seems reasonable to me.  I'm a little concerned we haven't
> validated many use-cases against the approach (but I don't see any obvious
> flaws).
>
> Thanks,
> Micah
>
> On Fri, May 17, 2019 at 5:16 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > As Micah brought up, as part of this we would like to formalize the
> > use of "ARROW:" as a reserved metadata key prefix. This is similar to
> > Apache Avro which uses "avro." as a reserved prefix [1]. If someone
> > has a different idea about what the prefix should be I'm open to other
> > ideas
> >
> > [1] :
> https://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files
> >
> > On Thu, May 16, 2019 at 7:29 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > >
> > > hi folks,
> > >
> > > In a prior mailing list thread from February [1] I brought up some
> > > work I'd done in C++ to create an API to define custom data types that
> > > can be embedded in built-in Arrow logical types. These are serialized
> > > through IPC by adding special fields to the `custom_metadata` member
> > > of Field in the Flatbuffers metadata [2]. The idea is that if an
> > > implementation does not understand the custom type, then they can
> > > still interact with the underlying data if need be, or pass on the
> > > extension metadata in subsequent IPC messages.
> > >
> > > David Li has put up a WIP PR to implement this for Java [4], so to
> > > help the project move forward I think it's a good time to formalize
> > > this, and if there are disagreements to hash them out now. I have just
> > > opened a PR to the Arrow specification documents [3] that describes
> > > the current state of C++ and also the WIP Java PR.
> > >
> > > Any thought about this? If there is consensus about this solution
> > > approach then I can hold a vote.
> > >
> > > Thanks
> > > Wes
> > >
> > > [1]:
> >
> https://lists.apache.org/thread.html/f1fc039471a8a9c06f2f9600296a20d4eb3fda379b23685f809118ee@%3Cdev.arrow.apache.org%3E
> > > [2]:
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L291
> > > [3]: https://github.com/apache/arrow/pull/4332
> > > [4]: https://github.com/apache/arrow/pull/4251
> >
>

Reply via email to