Hi David,

Thanks for sharing the link!

Here is how a potential use case might look like:

   1. Assume that we have a service S which accepts expressions in some
   language X.
   2. Assume that a typical query to this service requests entities A_1,
   A_2,..,A_K. Each of those entities generates a stream of record batches.
   Record batches for a single A_I share the same schema, yet there is no
   guarantee that schemas are equal across all streams.
   3. Assume that there is a strong reason to query A1,..,AK together.
   4. Service generates record batches(concurrently), tags those(e.g. with
   schema level metadata) and sends them over.

A potential way to address this(with the existing tools) could be having a
union schema of all fields across all entities(potentially prefixed with
the field name just like in sql joins) and setting the values to NA which
do not belong to an entity. However this solution might not work in cases
where we are not able to construct the unified schema before opening the
stream(e.g. in case of changes in the schema for a specific entity upon
realtime input feeding or an unpredictable generator expression).

Cheers,
Gosh


On Mon., 12 Apr. 2021, 13:45 David Li, <lidav...@apache.org> wrote:

> Hi Gosh,
>
> There was indeed a discussion where schema evolution was proposed as a
> solution for another use case:
>
> https://lists.apache.org/thread.html/re800c63f0eb08022c8cd5e1b2236fd69a2e85afdc34daf6b75e3b7b3%40%3Cdev.arrow.apache.org%3E
>
> I am curious though, what is your use case here?
>
> Best,
> David
>
> On 2021/04/12 10:49:00, Gosh Arzumanyan <gosh...@gmail.com> wrote:
> > Hi guys, hope you are well!
> >
> > Judging from the Flight API
> > <
> https://github.com/apache/arrow/blob/5b08205f7e864ed29f53ed3d836845fed62d5d4a/cpp/src/arrow/flight/types.h#L461
> >
> > and
> > from the documentation/examples out there, it seems like data schema is
> > supposed to be fixed per stream in ArrowFlight(which is also aligned with
> > corresponding IPC stream writers/readers).
> > Wondering if the community has evaluated the necessity/possibility of
> > supporting schema changes within a single stream(I do recall seeing a
> > discussion on this somewhere but can't find it)?
> >
> > Cheers,
> > Gosh
> >
>

Reply via email to