On Fri, Aug 12, 2016 at 5:57 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> Sorry for the late reply.
>
> This all sounds reasonable to me.  But I'm not sure I understand exactly
> what you mean by
>
>> Accordingly, in the metadata and in RPC/IPC scenarios, binary/string
>> would be a single array unit in the buffer stream and flattened Field
>> metadata rather than nested types (2 array units as they are
>> presently).
>
>
> The way I read it this seems to me to contradict the cross-implementation as
> "List<UInt8-not null>"?
>
> Thanks,
> Micah
>

I think we can resolve this by starting a "Logical Types and IPC/RPC
layout" specification document.

The schema metadata
(https://github.com/apache/arrow/blob/master/format/Message.fbs) is,
as I understand it, strictly the domain of logical types. I believe
there is some minor conflation of the notions of primitive physical
types and primitive logical types.

While String / Binary have identical physical layouts to List<UInt8
not null>, in the domain of logical types and IPC, what we are saying
is that these types are:

- logically speaking: primitive, non-nested types
- their IPC layout is the flattened version of the nested List<UInt8>
counterpart -- a single Field node having String type (with a null
count, etc.), and 3 memory buffers: validity bitmap, offsets, and
data. Structurally on the wire / in shared memory (compared with
List<UInt8 not null>) the only difference is the Field metadata (since
if null count is 0 for the inner UInt8 values, then there is only a
single buffer) -- one node versus two

Let me know if this does not make sense.

To move this forward I propose to begin a Logical Types / IPC layout
document and begin to document the mapping between logical types and
their physical in-memory representation and layout on the wire.

- Wes

Reply via email to