Thanks again Julian.

Since, mapPartitions is really a specialized map would it be best to model
it as a SELECT (similar to functions inside an expression) ?
Barring cases where h > h' and mapPartitions acts like a filter.

On Thu, Mar 4, 2021 at 11:41 AM Julian Hyde <[email protected]> wrote:

> SQL has equivalents of many functional programming idioms:
>
>   * map is SELECT
>   * filter is WHERE
>   * flatMap is similar to CROSS APPLY
>
> That said, SQL’s strength is that the operations are not optimized for any
> particular physical organization of data (e.g. working on sorted or
> partitioned data). mapPartitions is in this category. Of course a physical
> implementation of one of SQL’s logical operators might use mapPartitions.
>
> Julian
>
>
>
>
>
> > On Mar 4, 2021, at 10:44 AM, Debajyoti Roy <[email protected]> wrote:
> >
> > Thanks for the responses, adding some more color below.
> >
> > Spark's API adopted concepts from the functional programming paradigm
> (map,
> > filter, flatmap,...) into data processing. Spark did add several
> relational
> > operators like join, union, select, etc. However, there are certain APIs
> > that are really hard to model in terms of standard relational operators.
> > Let me take one example of mapPartitions.
> >
> > mapPartitions( T -> U ):
> > w columns and h rows can turn into totally different w' != w columns and
> h'
> > != h rows. Since processing happens per partition, this API is a great
> > choice for vectorized heavyweight initialization cost operations e.g.
> batch
> > inferencing.
> >
> > In terms of relational models, mapPartitions can be modeled just like a
> > function inside an expression operator. However, there can be interesting
> > cases e.g. h > h' and mapMartitions starts to feel like a filter. Can
> there
> > be other challenges and opportunities in terms of planner and optimizer
> > because mapPartitions is definitely NOT like any other function inside an
> > expression as shown below:
> >
> > SELECT name, address, mapPartitions(id, tweet, '{threshold: 0.5}',
> > 'sentiment_analysis_4', 10000) FROM my_twitter_data...
> >
> > So what is a better usage example for mapPartitions expressed as SQL ? I
> am
> > really struggling with that part and I agree with Julian that is the key.
> >
> > Regards,
> > Debajyoti Roy
> >
> > On Thu, Mar 4, 2021 at 12:01 AM Julian Hyde <[email protected]>
> wrote:
> >
> >> I searched for mapPartitions and flatMapGroupsWithState, and it looks as
> >> if you are talking about Apache Spark operations. Can you give some
> >> examples of typical queries that would use these operations?
> >>
> >> It’s possible that these operations accomplish things that are not
> >> possible in the relational model; but I think it’s more likely that
> these
> >> are algorithms that can implement relational operations such as windowed
> >> aggregate functions. If you give some examples, we can see whether they
> can
> >> be expressed in SQL or relational algebra.
> >>
> >> Julian
> >>
> >>
> >>> On Mar 3, 2021, at 10:54 PM, Rui Wang <[email protected]> wrote:
> >>>
> >>> Well I think the expected approach is to translate other operations to
> >>> relational operators by yourself ;-)
> >>>
> >>> I think Calcite won't be the place to have extensions for such
> >> translation.
> >>> The main concern is that those non relational operations are
> >> "non-standard".
> >>>
> >>> -Rui
> >>>
> >>> On Wed, Mar 3, 2021 at 10:12 PM Debajyoti Roy <[email protected]>
> >> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> For operators like filter, join, union, aggregate, window the
> >>>> logical RelNode choices are obvious within
> >> org.apache.calcite.rel.logical
> >>>> package.
> >>>>
> >>>> However, for more complex applications that require operations like
> >>>> mapPartitions, flatMapGroupsWithState, etc. what would be some choices
> >> for
> >>>> logical rel node and relational expression classes in Apache Calcite
> >>>> (independent of engine)?
> >>>>
> >>>> What is a good way to model operators that are not traditionally
> >> relational
> >>>> and hence do not exist in Calcite (looking for hooks / extension
> points
> >> /
> >>>> roadmaps)?
> >>>>
> >>>> Thanks in advance for any pointers, (disclaimer: I am new to Calcite)
> >>>> Debajyoti Roy
> >>>>
> >>
> >>
>
>

Reply via email to