Arrow as a streaming format

Pedro Silva Fri, 04 Sep 2020 00:39:24 -0700

Hello,

This may be a stupid question but is Arrow used for or designed with
streaming processing use-cases in mind, where data is non-stationary. I.e:
Flink stream processing jobs?


Particularly, is it possible from a given event source (say Kafka) to
efficiently generate incremental record batches for stream processing?

Suppose there is a data source that continuously generates messages with
100+ fields. You want to compute grouped aggregations (sums, averages,
count distinct, etc...) over a select few of those fields, say 5 fields at
most used for all queries.

Is this a valid use-case for Arrow?
What if time is important and some windowing technique has to be applied?

Thank you very much for your time!
Have a good day.

Arrow as a streaming format

Reply via email to