Re: Arrow as a streaming format

Radu Teodorescu Fri, 04 Sep 2020 05:57:21 -0700

Hi Pedro,
You should be able to use flight for this: pack you subscription call in a 
DoGet and listen on the FlightDataStream for new data.


I thinkˆyou can control the granularity of your messages through the size of 
the record batches you are writing, but I am not a flight developer so don’t 
take my word for it.

Overall, “live" data streaming was not the primary use case behind arrow 
flight, but I think there is a lot of interest in that application and I think 
flight fundamentals are quite suitable for it.

  
Here is a somewhat related thread:
http://mail-archives.apache.org/mod_mbox/arrow-dev/202008.mbox/%3CCADr7h-dAJrsYB%2BOUN94Z-KBkd4Jt82F78pfE3%2Bj7fg7MX1BrXw%40mail.gmail.com%3E
 
<http://mail-archives.apache.org/mod_mbox/arrow-dev/202008.mbox/%3ccadr7h-dajrsyb+oun94z-kbkd4jt82f78pfe3+j7fg7mx1b...@mail.gmail.com%3E>


> On Sep 4, 2020, at 3:39 AM, Pedro Silva <pedro.cl...@gmail.com> wrote:
> 
> Hello,
> 
> This may be a stupid question but is Arrow used for or designed with
> streaming processing use-cases in mind, where data is non-stationary. I.e:
> Flink stream processing jobs?
> 
> Particularly, is it possible from a given event source (say Kafka) to
> efficiently generate incremental record batches for stream processing?
> 
> Suppose there is a data source that continuously generates messages with
> 100+ fields. You want to compute grouped aggregations (sums, averages,
> count distinct, etc...) over a select few of those fields, say 5 fields at
> most used for all queries.
> 
> Is this a valid use-case for Arrow?
> What if time is important and some windowing technique has to be applied?
> 
> Thank you very much for your time!
> Have a good day.

Re: Arrow as a streaming format

Reply via email to