Hi Micah,
Thank you for expanding the scope for Big Endian support in Arrow. I am 
glad to see this when I am back from one-week vacation.
I agree with this since we have just seen the kickoff of BE support in Go. 


Hi Wes,
Thank you for your positive comments. We should carefully implement BE 
support without performance overhead.

Let me respond to Micah's comments and questions. 

> 1.  As long as there is CI in place to catch regressions (right now I 
think the CI is fairly unreliable?)
I agree. While TravisCI on s390x are unreliable, each platform can set up 
CI script.

> 2.  No degradation in performance for little-endian architectures 
(verified by additional micro benchmarks)
Yes, we can do it. @kou suggested me to use the existing mechanism like 
[1]. Now, it supports C++.We could expand this to other languages.

> 3.  Not a large amount of invasive code to distinguish between 
platforms.
Yes, I will prepare the current holistic strategy for endian-independent 
implementation in another mail.


> Kazuaki Ishizaki I asked question previously, but could you give some 
data points around:
Sorry for the delay. I was waiting for comments during my one-week 
vacation.

> 1.  The current state of C++ support (how much code needed to change)?
Except the parquet support, in C++, in my understanding [2] is the last PR 
to support big-endian for intra-process and inter-process in C++.

> 2.  How many more PRs you expect to need for Java (and approximate 
size)?
I have just submitted two PRs [5, 6] today to pass the current test using 
“mvn install”. The total size of the four PRs [3, 4, 5, 6] are around 
200 lines. In addition, I will submit another PR (~500 lines per 
estimation) to pass integration test between different endians, which 
corresponds to the PR [2] in C++.
For CI for Java, I submitted one draft PR [7] (~100 lines).


BTW, when I saw some test cases, I think that it would be good to increase 
test coverage for little-endian and big-endian by adding test cases (e.g. 
increase the types of a schema to be tested in flight-core like [8]). If 
it is acceptable, we can submit more PRs regardless of endianness.

[1] https://github.com/apache/arrow/pull/7940#issuecomment-672690540
[2] https://github.com/apache/arrow/pull/7507
[3] https://github.com/apache/arrow/pull/7942
[4] https://github.com/apache/arrow/pull/7944
[5] https://github.com/apache/arrow/pull/8056
[6] https://github.com/apache/arrow/pull/8057
[7] https://github.com/apache/arrow/pull/7938
[8] https://github.com/apache/arrow/pull/7555

Best Regards,
Kazuaki Ishizaki

Wes McKinney <wesmck...@gmail.com> wrote on 2020/08/26 21:27:49:

> From: Wes McKinney <wesmck...@gmail.com>
> To: dev <dev@arrow.apache.org>, Micah Kornfield <emkornfi...@gmail.com>
> Cc: Fan Liya <liya.fa...@gmail.com>
> Date: 2020/08/26 21:28
> Subject: [EXTERNAL] Re: [DISCUSS] Big Endian support in Arrow (was: 
> Re: [Java] Supporting Big Endian)
> 
> hi Micah,
> 
> I agree with your reasoning. If supporting BE in some languages (e.g.
> Java) is impractical due to performance regressions on LE platforms,
> then I don't think it's worth it. But if it can be handled at compile
> time or without runtime overhead, and tested / maintained properly on
> an ongoing basis, then it seems reasonable to me. It seems that the
> number of Arrow stakeholders will only increase from here so I would
> hope that there will be more people invested in helping maintain BE in
> the future.
> 
> - Wes
> 
> On Tue, Aug 25, 2020 at 11:33 PM Micah Kornfield 
> <emkornfi...@gmail.com> wrote:
> >
> > I'm expanding the scope of this thread since it looks like work has 
also
> > started for making golang support BigEndian architectures.
> >
> > I think as a community we should come to a consensus on whether we 
want to
> > support Big Endian architectures in general.  I don't think it is a 
good
> > outcome if some implementations accept PRs for Big Endian fixes and 
some
> > don't.
> >
> > But maybe this is OK with others?
> >
> > My current opinion on the matter is that we should support it under 
the
> > following conditions:
> >
> > 1.  As long as there is CI in place to catch regressions (right now I 
think
> > the CI is fairly unreliable?)
> > 2.  No degradation in performance for little-endian architectures 
(verified
> > by additional micro benchmarks)
> > 3.  Not a large amount of invasive code to distinguish between 
platforms.
> >
> > Kazuaki Ishizaki I asked question previously, but could you give some 
data
> > points around:
> > 1.  The current state of C++ support (how much code needed to change)?
> > 2.  How many more PRs you expect to need for Java (and approximate 
size)?
> >
> > I think this would help myself and others in the decision making 
process.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Aug 18, 2020 at 9:15 AM Micah Kornfield 
<emkornfi...@gmail.com>
> > wrote:
> >
> > > My thoughts on the points raised so far:
> > >
> > > * Does supporting Big Endian increase the reach of Arrow by a lot?
> > >
> > > Probably not a significant amount, but it does provide one more 
avenue of
> > > adoption.
> > >
> > > * Does it increase code complexity?
> > >
> > > Yes.  I agree this is a concern.  The PR in question did not seem 
too bad
> > > to me but this is subjective.  I think the remaining question is how 
many
> > > more places need to be fixed up in the code base and how invasive 
are the
> > > changes.  In C++ IIUC it turned out to be a relatively small number 
of
> > > places.
> > >
> > > Kazuaki Ishizaki have you been able to get the Java implementation 
working
> > > fully locally?  How many additional PRs will be needed and what do
> > > they look like (I think there already a few more in the queue)?
> > >
> > > * Will it introduce performance regressions?
> > >
> > > If done properly I suspect no, but I think if we continue with 
BigEndian
> > > support the places that need to be touched should have benchmarks 
added to
> > > confirm this (including for PRs already merged).
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Sun, Aug 16, 2020 at 7:37 PM Fan Liya <liya.fa...@gmail.com> 
wrote:
> > >
> > >> Thank Kazuaki Ishizaki for working on this.
> > >> IMO, supporting the big-endian should be a large change, as in many
> > >> places of the code base, we have implicitly assumed the 
little-endian
> > >> platform (e.g.
> > >> INVALID URI REMOVED
> 
u=https-3A__github.com_apache_arrow_blob_master_java_memory_memory-2Dcore_src_main_java_org_apache_arrow_memory_util_ByteFunctionHelpers.java&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B-
> n7ezRb8qyWtog&s=poFSWqjJv99prou53ciinHyBmh5IZlXLlhYvftb9fu4&e= 
> > >> ).
> > >> Supporting the big-endian platform may introduce branches in such 
places
> > >> (or virtual calls) which will affect the performance.
> > >> So it would be helpful to evaluate the performance impact.
> > >>
> > >> Best,
> > >> Liya Fan
> > >>
> > >>
> > >> On Sat, Aug 15, 2020 at 7:54 AM Jacques Nadeau <jacq...@apache.org>
> > >> wrote:
> > >>
> > >>> Hey Micah, thanks for starting the discussion.
> > >>>
> > >>> I just skimmed that thread and it isn't entirely clear that there 
was a
> > >>> conclusion that the overhead was worth it. I think everybody 
agrees that
> > >>> it
> > >>> would be nice to have the code work on both platforms. On the 
flipside,
> > >>> the
> > >>> code noise for a rare case makes the cost-benefit questionable.
> > >>>
> > >>> In the Java code, we wrote the code to explicitly disallow big 
endian
> > >>> platforms and put preconditions checks in. I definitely think if 
we want
> > >>> to
> > >>> support this, it should be done holistically across the code with
> > >>> appropriate test plan (both functional and perf).
> > >>>
> > >>> To me, the question is really about how many use cases are blocked 
by
> > >>> this.
> > >>> I'm not sure I've heard anyone say that the limiting factor 
toleveraging
> > >>> Java Arrow was the block on endianess. Keep in mind that until 
very
> > >>> recently, using any Arrow Java code would throw a preconditions 
check
> > >>> before you could even get started on big-endian and I don't think 
we've
> > >>> seen a bunch of messages on that exception. Adding if conditions
> > >>> throughout
> > >>> the codebase like this patch: [1] isn't exactly awesome and it can 
also
> > >>> risk performance impacts depending on how carefully it is done.
> > >>>
> > >>> If there isn't a preponderance of evidence of many users 
beingblocked by
> > >>> this capability, I don't think we should accept the code. We 
already
> > >>> have a
> > >>> backlog of items that we need to address just ensure existing use 
cases
> > >>> work well. Expanding to new use cases that there is no clear 
demand for
> > >>> will likely just increase code development cost at little benefit.
> > >>>
> > >>> What do others think?
> > >>>
> > >>> [1] INVALID URI REMOVED
> 
u=https-3A__github.com_apache_arrow_pull_7923-23issuecomment-2D674311119&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B-
> n7ezRb8qyWtog&s=vmvc0b4yHFfWLjLheCRysSiyaeRFO_6p0wdH-sLa7M8&e= 
> > >>>
> > >>> On Fri, Aug 14, 2020 at 4:36 PM Micah Kornfield 
<emkornfi...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > Kazuaki Ishizak has started working on Big Endian support in 
Java
> > >>> > (including setting up CI for it).  Thank you!
> > >>> >
> > >>> > We previously discussed support for Big Endian architectures in 
C++
> > >>> [1] and
> > >>> > generally agreed that it was a reasonable thing to do.
> > >>> >
> > >>> > Similar to C++ I think as long as we have a working CI setup it 
is
> > >>> > reasonable for Java to support Big Endian machines.
> > >>> >
> > >>> > But I think there might be differing opinions so it is worth a
> > >>> discussion
> > >>> > to see if there are technical blockers or other reasons for not
> > >>> supporting
> > >>> > Big Endian architectures in the existing java implementation.
> > >>> >
> > >>> > Thanks,
> > >>> > Micah
> > >>> >
> > >>> >
> > >>> > [1]
> > >>> >
> > >>> >
> > >>> INVALID URI REMOVED
> 
u=https-3A__lists.apache.org_thread.html_rcae745f1d848981bb5e8dddacfc4554641aba62e3c949b96bfd8b019-2540-253Cdev.arrow.apache.org-253E&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B-
> n7ezRb8qyWtog&s=oDBWI9pmI39bTsEieQNDxZit0My21hLIW0fJRPJI0AM&e= 
> > >>> >
> > >>>
> > >>
> 


Reply via email to