gRPC breaks large buffers into smaller pieces that have to be
reassembled after receipt -- this does add some overhead. I would
guess that circumventing gRPC for the transfer of each IPC messages
would be the route to throughput beyond the 20-40Gbps that we're able
to achieve now.

On Fri, Apr 24, 2020 at 1:57 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> I'm not sure a new transport for gRPC would change anything.  gRPC
> currently uses HTTP (HTTP2 I believe), and there's no reason for HTTP to
> be the culprit here.
>
> Regards
>
> Antoine.
>
>
> Le 24/04/2020 à 20:48, Micah Kornfield a écrit :
> > A couple of questions:
> > 1.  For same node transport would doing something with Plasma be a
> > reasonable approach?
> > 2.  What are the advantages/disadvantages of creating a new transport for
> > gRPC [1] vs building an entirely new backend of flight?
> >
> > Thanks,
> > Micah
> >
> > [1] https://github.com/grpc/grpc/issues/7931
> >
> > On Fri, Apr 24, 2020 at 11:37 AM David Li <li.david...@gmail.com> wrote:
> >
> >> Having alternative backends for Flight has been a goal from the start,
> >> hence why gRPC is wrapped and generally not exposed to the user. I
> >> would be interested in collaborating on an HTTP/1 backend that is
> >> accessible from the browser (or via an alternative transport meeting
> >> the same requirements, e.g. WebSockets).
> >>
> >> In terms of tuning gRPC, taking a performance profile would be useful.
> >> I remember there are some TODOs on the C++ side about copies that
> >> sometimes occur due to gRPC that we don't quite understand yet. I
> >> spent quite a bit of time a while ago trying to tune gRPC, but like
> >> Antoine, couldn't find any easy wins.
> >>
> >> Best,
> >> David
> >>
> >> On 4/24/20, Antoine Pitrou <anto...@python.org> wrote:
> >>>
> >>> Hi Jiajia,
> >>>
> >>> I see.  I think there are two possible avenues to try and improve this:
> >>>
> >>> * better use gRPC in the hope of achieving higher performance.  This
> >>> doesn't seem to be easy, though.  I've already tried to change some of
> >>> the parameters listed here, but didn't get any benefits:
> >>> https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html
> >>>
> >>> (perhaps there are other, lower-level APIs that we should use? I don't
> >>> know)
> >>>
> >>> * take the time to design and start implementing another I/O backend for
> >>> Flight.  gRPC is just one possible backend, but the Flight remote API is
> >>> simple enough that we could envision other backends (for example a HTTP
> >>> REST-like API).  If you opt for this, I would strongly suggest start the
> >>> discussion on the mailing-list in order to coordinate with other
> >>> developers.
> >>>
> >>> Best regards
> >>>
> >>> Antoine.
> >>>
> >>>
> >>> Le 24/04/2020 à 19:16, Li, Jiajia a écrit :
> >>>> Hi Antoine,
> >>>>
> >>>>> The question, though, is: do you *need* those higher speeds on
> >> localhost?
> >>>>>  In which context are you considering Flight?
> >>>>
> >>>> We want to send large data(in cache) to the data analytic application(in
> >>>> local).
> >>>>
> >>>> Thanks,
> >>>> Jiajia
> >>>>
> >>>> -----Original Message-----
> >>>> From: Antoine Pitrou <anto...@python.org>
> >>>> Sent: Saturday, April 25, 2020 1:01 AM
> >>>> To: dev@arrow.apache.org
> >>>> Subject: Re: Question regarding Arrow Flight Throughput
> >>>>
> >>>>
> >>>> Hi Jiajia,
> >>>>
> >>>> It's true one should be able to reach higher speeds.  For example, I can
> >>>> reach more than 7 GB/s on a simple TCP connection, in pure Python, using
> >>>> only two threads:
> >>>> https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
> >>>>
> >>>> The question, though, is: do you *need* those higher speeds on
> >> localhost?
> >>>> In which context are you considering Flight?
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>>
> >>>>
> >>>> Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
> >>>>> Hi Antoine,
> >>>>>
> >>>>> I think here 5 GB/s is in localhost. As localhost does not depend on
> >>>>> network speed and I've checked the CPU is not the bottleneck when
> >> running
> >>>>> benchmark, I think flight can get a higher throughput.
> >>>>>
> >>>>> Thanks,
> >>>>> Jiajia
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Antoine Pitrou <anto...@python.org>
> >>>>> Sent: Friday, April 24, 2020 5:47 PM
> >>>>> To: dev@arrow.apache.org
> >>>>> Subject: Re: Question regarding Arrow Flight Throughput
> >>>>>
> >>>>>
> >>>>> The problem with gRPC is that it was designed with relatively small
> >>>>> requests and payloads in mind.  We're using it for a large data
> >>>>> application which it wasn't optimized for.  Also, its threading model
> >> is
> >>>>> inscrutable (yielding those weird benchmark results).
> >>>>>
> >>>>> However, 5 GB/s is indeed very good if between different machines.
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Antoine.
> >>>>>
> >>>>>
> >>>>> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
> >>>>>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <wesmck...@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> hi Jiajia,
> >>>>>>>
> >>>>>>> See my TODO here
> >>>>>>>
> >>>>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
> >>>>>>> g
> >>>>>>> ht_benchmark.cc#L182
> >>>>>>>
> >>>>>>> My guess is that if you want to get faster throughput with multiple
> >>>>>>> cores, you need to run more than one server and serve on different
> >>>>>>> ports rather than having all threads go to the same server through
> >>>>>>> the same port. I don't think we've made any manycore scalability
> >>>>>>> claims, though.
> >>>>>>>
> >>>>>>> I tried to run this myself but I can't get the benchmark executable
> >>>>>>> to run on my machine right now -- this seems to be a regression.
> >>>>>>>
> >>>>>>> https://issues.apache.org/jira/browse/ARROW-8578
> >>>>>>
> >>>>>> This turned out to be a false alarm and went away after a reboot.
> >>>>>>
> >>>>>> On my laptop a single thread is faster than multiple threads making
> >>>>>> requests to a sole server, so this supports the hypothesis that
> >>>>>> concurrent requests on the same port does not increase throughput.
> >>>>>>
> >>>>>> $ ./release/arrow-flight-benchmark -num_threads 1
> >>>>>> Speed: 5131.73 MB/s
> >>>>>>
> >>>>>> $ ./release/arrow-flight-benchmark -num_threads 16
> >>>>>> Speed: 4258.58 MB/s
> >>>>>>
> >>>>>> I'd suggest improving the benchmark executable to spawn multiple
> >>>>>> servers as the next step to study multicore throughput. That said
> >>>>>> with the above being ~40gbps already it's unclear how higher
> >>>>>> throughput can go realistically.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> - Wes
> >>>>>>>
> >>>>>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <jiajia...@intel.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I have some doubts about arrow flight throughput. In this
> >>>>>>>> article(https://www.dremio.com/understanding-apache-arrow-flight/),
> >>>>>>>> it said "High efficiency. Flight is designed to work without any
> >>>>>>>> serialization or deserialization of records, and with zero memory
> >>>>>>>> copies, achieving over 20 Gbps per core."  And in the other article
> >>>>>>>> (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
> >> ),
> >>>>>>>> it said "As far as absolute speed, in our C++ data throughput
> >>>>>>>> benchmarks, we are seeing end-to-end TCP throughput in excess of
> >>>>>>>> 2-3GB/s on localhost without TLS enabled. This benchmark shows a
> >>>>>>>> transfer of ~12 gigabytes of data in about 4 seconds:"
> >>>>>>>>
> >>>>>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a
> >>>>>>>> server with two cores, the throughput will be 5 GB/s?  But I have
> >> run
> >>>>>>>> the arrow-flight-benchmark, my server with 40 cores, but the result
> >> is
> >>>>>>>> " Speed: 2420.82 MB/s" .
> >>>>>>>>
> >>>>>>>> So what should I do to increase the throughput? Please correct me
> >> if I
> >>>>>>>> am wrong. Thank you in advance!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jiajia
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>
> >>
> >

Reply via email to