Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Antoine Pitrou Tue, 22 Sep 2020 03:30:46 -0700

Le 22/09/2020 à 06:36, Micah Kornfield a écrit :
> I wanted to give this thread a bump, does the proposal I made below sound
> reasonable?

It does!

Regards

Antoine.


> 
> On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield <[email protected]>
> wrote:
> 
>> If I read the responses so far it seems like the following might be a good
>> compromise/summary:
>>
>> 1. It does not seem too invasive to support native endianness in
>> implementation libraries.  As long as there is appropriate performance
>> testing and CI infrastructure to demonstrate the changes work.
>> 2. It is up to implementation maintainers if they wish to accept PRs that
>> handle byte swapping between different architectures.  (Right now it sounds
>> like C++ is potentially OK with it and for Java at least Jacques is opposed
>> to it?
>>
>> Testing changes that break big-endian can be a potential drag on developer
>> productivity but there are methods to run locally (at least on more recent
>> OSes).
>>
>> Thoughts?
>>
>> Thanks,
>> Micah
>>
>> On Mon, Aug 31, 2020 at 7:08 PM Fan Liya <[email protected]> wrote:
>>
>>> Thank Kazuaki for the survey and thank Micah for starting the discussion.
>>>
>>> I do not oppose supporting BE. In fact, I am in general optimistic about
>>> the performance impact (for Java).
>>> IMO, this is going to be a painful way (many byte order related problems
>>> are tricky to debug), so I hope we can make it short.
>>>
>>> It is good that someone is willing to take this on, and I would like to
>>> provide help if needed.
>>>
>>> Best,
>>> Liya Fan
>>>
>>>
>>>
>>> On Tue, Sep 1, 2020 at 7:25 AM Bryan Cutler <[email protected]> wrote:
>>>
>>>> I also think this would be a worthwhile addition and help the project
>>>> expand in more areas. Beyond the Apache Spark optimization use case,
>>> having
>>>> Arrow interoperability with the Python data science stack on BE would be
>>>> very useful. I have looked at the remaining PRs for Java and they seem
>>>> pretty minimal and straightforward. Implementing the equivalent record
>>>> batch swapping as done in C++ at [1] would be a little more involved,
>>> but
>>>> still reasonable. Would it make sense to create a branch to apply all
>>>> remaining changes with CI to get a better picture before deciding on
>>>> bringing into master branch?  I could help out with shepherding this
>>> effort
>>>> and assist in maintenance, if we decide to accept.
>>>>
>>>> Bryan
>>>>
>>>> [1] https://github.com/apache/arrow/pull/7507
>>>>
>>>> On Mon, Aug 31, 2020 at 1:42 PM Wes McKinney <[email protected]>
>>> wrote:
>>>>
>>>>> I think it's well within the right of an implementation to reject BE
>>>>> data (or non-native-endian), but if an implementation chooses to
>>>>> implement and maintain the endianness conversions, then it does not
>>>>> seem so bad to me.
>>>>>
>>>>> On Mon, Aug 31, 2020 at 3:33 PM Jacques Nadeau <[email protected]>
>>>> wrote:
>>>>>>
>>>>>> And yes, for those of you looking closely, I commented on ARROW-245
>>>> when
>>>>> it
>>>>>> was committed. I just forgot about it.
>>>>>>
>>>>>> It looks like I had mostly the same concerns then that I do now :)
>>> Now
>>>>> I'm
>>>>>> just more worried about format sprawl...
>>>>>>
>>>>>> On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>>> What do you mean?  The Endianness field (a Big|Little enum) was
>>>> added 4
>>>>>>>> years ago:
>>>>>>>> https://issues.apache.org/jira/browse/ARROW-245
>>>>>>>
>>>>>>>
>>>>>>> I didn't realize that was done, my bad. Good example of format rot
>>>>> from my
>>>>>>> pov.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Reply via email to