Hello,

I would like understand where do we stand on logical types and physical
types. As I understand, this proposal is for the physical representation.

In the context of an execution engine, the concept of logical types becomes
more important as two physical representation might have the same semantical
values, e.g. LargeList and List where all values fits in the 32-bits.  A
more
complex example would be an Integer array and a dictionary array where
values
are integers.

Is this something only something only relevant for execution engine? What
about
the (C++) Array.Equals method and related comparisons methods? This also
touch
the subject of type equality, e.g. dict with different but compatible
encoding.

Jacques, knowing that you worked on Parquet (which follows this model) and
Dremio,
what is your opinion?

François

Some related tickets:
- https://jira.apache.org/jira/browse/ARROW-554
- https://jira.apache.org/jira/browse/ARROW-1741
- https://jira.apache.org/jira/browse/ARROW-3144
- https://jira.apache.org/jira/browse/ARROW-4097
- https://jira.apache.org/jira/browse/ARROW-5052



On Thu, Apr 11, 2019 at 4:52 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> ARROW-4810 [1] and ARROW-750 [2] discuss adding types with 64-bit offsets
> to Lists, Strings and binary data types.
>
> Philipp started an implementation for the large list type [3] and I hacked
> together a potentially viable java implementation [4]
>
> I'd like to kickoff the discussion for getting these types voted on.  I'm
> coupling them together because I think there are design consideration for
> how we evolve Schema.fbs
>
> There are two proposed options:
> 1.  The current PR proposal which adds a new type LargeList:
>   // List with 64-bit offsets
>   table LargeList {}
>
> 2.  As François suggested, it might cleaner to parameterize List with
> offset width.  I suppose something like:
>
> table List {
>   // only 32 bit and 64 bit is supported.
>   bitWidth: int = 32;
> }
>
> I think Option 2 is cleaner and potentially better long-term, but I think
> it breaks forward compatibility of the existing arrow libraries.  If we
> proceed with Option 2, I would advocate making the change to Schema.fbs all
> at once for all types (assuming we think that 64-bit offsets are desirable
> for all types) along with future compatibility checks to avoid multiple
> releases were future compatibility is broken (by broken I mean the
> inability to detect that an implementation is receiving data it can't
> read).    What are peoples thoughts on this?
>
> Also, any other concern with adding these types?
>
> Thanks,
> Micah
>
> [1] https://issues.apache.org/jira/browse/ARROW-4810
> [2] https://issues.apache.org/jira/browse/ARROW-750
> [3] https://github.com/apache/arrow/pull/3848
> [4]
>
> https://github.com/apache/arrow/commit/03956cac2202139e43404d7a994508080dc2cdd1
>

Reply via email to