I don't understand the limitation to different types, so +1 for
generalized unions.  That said, I don't think it's high-priority either.

Regards

Antoine.


Le 24/05/2019 à 04:17, Micah Kornfield a écrit :
> I'd like to bump this thread, to see if anyone has any comments.  If nobody
> objects I will try to start implementing the changes next week.
> 
> Thanks,
> Micah
> 
> On Mon, May 20, 2019 at 9:37 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> 
>> In the past [1] there hasn't been agreement on the final requirements for
>> union types.
>>
>> Briefly the two approaches that are currently advocated:
>> 1.  Limit unions to only contain one field of each individual type (e.g.
>> you can't have two separate int32 fields).  Java takes this approach.
>> 2.  Generalized unions (unions can have any number of fields with the same
>> type).  C++ takes this approach.
>>
>> There was a prior PR [2] that stalled in trying to take this approach with
>> Java.  For writing vectors it seemed to be slower on a benchmark.
>>
>> My proposal:  We should pursue option 2 (the general approach).  There are
>> already data interchange formats that support it and it would be nice to a
>> data-model that lets us make the translation between Arrow schemas easy:
>> 1.  Avro Seems to support it [3] (with the exception of complex types)
>> 2.  Protobufs loosely support it [4] via one-of.
>>
>> In order to address issues in [2], I propose the following making the
>> changes/additions to the Java implementation:
>> 1.  Keep the default write-path untouched with the existing class.
>> 2.  Add in a new sparse union class that implements the same interface
>> that can be used on the read path, and if a client opts in (via direct
>> construction).
>> 3.  Add in a dense union class (I don't believe Java has one).
>>
>> I'm still ramping up the Java code base, so I'd like other Java
>> contributors to chime in to see if this plan sounds feasible and acceptable.
>>
>> Any other thoughts on Unions?
>>
>> Thanks,
>> Micah
>>
>> [1]
>> https://lists.apache.org/thread.html/82ec2049fc3c29de232c9c6962aaee9ec022d581cecb6cf0eb6a8f36@%3Cdev.arrow.apache.org%3E
>> [2] https://github.com/apache/arrow/pull/987#issuecomment-493231493
>> [3] https://github.com/apache/arrow/pull/987#issuecomment-493231493
>> [4] https://developers.google.com/protocol-buffers/docs/proto#oneof
>>
> 

Reply via email to