Thanks for the clarification!
For the purposes of avoiding the C2R2C, I landed up adding my own BatchType
and ConvFunc so that I could convert my BigQuery BatchScan (backed up by
Arrow datastructures) to a Velox ColumnarBatch.

On Wed, Jun 4, 2025 at 10:30 PM Hongze Zhang <[email protected]> wrote:

> > Now i know this class i.e *ApplyColumnarRulesAndInsertTransitions*
> doesn't
> matter in the context of my question, since gluten anyways calls
> *RemoveTransitions* later
>
> Yes, Gluten will call RemoveTransitions right away when the columnar
> query optimization starts, then the subsequent columnar rules could
> see a cleaner query plan that is easier to optimize. After the
> columnar rules are executed, all the needed transitions will be added
> again by `InsertTransitions`[1] in one go, the C2R2C transition you
> mentioned is added by this rule either.
>
> Hongze
>
> [1]
> https://github.com/apache/incubator-gluten/blob/eda660b572c78a8aaf5ea0f9d217e5d0ca6340c7/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxRuleApi.scala#L106
>
> On Tue, Jun 3, 2025 at 3:51 PM Abbas Gadhia <[email protected]>
> wrote:
> >
> > Hi Hongze,
> >
> > Spark-to-Velox C2C is not supported yet
> >
> >
> > Thanks for clarifying this. Makes sense now :)
> >
> > I don't clearly get the issue here. Would you give an example?
> >
> >
> > Apologies if I wasn't clear. I was referring to the Spark
> > *ApplyColumnarRulesAndInsertTransitions* rule that adds a *ColumnarToRow*
> > node by looking at the "supportsColumnar" boolean of the Shuffle node.
> > Now i know this class i.e *ApplyColumnarRulesAndInsertTransitions*
> doesn't
> > matter in the context of my question, since gluten anyways calls
> > *RemoveTransitions* later
> >
> > Thanks much for the clarification again!
> > Regds
> > Abbas
> >
> > On Tue, Jun 3, 2025 at 7:14 PM Hongze Zhang <[email protected]> wrote:
> >
> > > Hi Abbas,
> > >
> > > > This seems a little redundant apparently?
> > >
> > > This is actually a C2R2C transition used to convert from vanilla
> > > Spark's columnar format to Velox's. It's necessary because
> > > Spark-to-Velox C2C is not supported yet.
> > >
> > > > I found out that ColumnarToRow is being added since the Shuffle does
> not
> > > ouput a columnar output, but i also saw that gluten code removes that
> rule
> > > intermittently while adding transitions.
> > >
> > > I don't clearly get the issue here. Would you give an example?
> > >
> > > Best,
> > > Hongze
> > >
> > > On Tue, Jun 3, 2025 at 11:52 AM Abbas Gadhia
> > > <[email protected]> wrote:
> > > >
> > > > Hello,
> > > > I have a plan that looks like this
> > > >
> > > > HashAggregateTransformer(keys=[country_code#0],
> > > > functions=[sum(latest_trade_data#29L),
> avg(latest_industrial_data#28L)],
> > > > isStreamingAgg=false, output=[country_code#0,
> sum(latest_trade_data)#95L,
> > > > avg(latest_industrial_data)#96])
> > > > +- AQEShuffleRead coalesced
> > > >  +- ShuffleQueryStage 0
> > > >   +- Exchange hashpartitioning(country_code#0, 5),
> ENSURE_REQUIREMENTS,
> > > > [plan_id=668]
> > > >    +- VeloxColumnarToRow
> > > >     +- ^(1) FlushableHashAggregateTransformer(keys=[country_code#0],
> > > > functions=[partial_sum(latest_trade_data#29L),
> > > > partial_avg(latest_industrial_data#28L)], isStreamingAgg=false,
> > > > output=[country_code#0, sum#107L, sum#108, count#109L])
> > > >      +- ^(1) ProjectExecTransformer [country_code#0,
> > > > latest_industrial_data#28L, latest_trade_data#29L]
> > > >       +- ^(1) FilterExecTransformer (trim(short_name#1, None) = Low
> > > income)
> > > >        +- ^(1) InputIteratorTransformer[columns...]
> > > >
> > > > *        +- RowToVeloxColumnar         +- *(1) ColumnarToRow*
> > > >           +- BatchScan country_summary[columns...] Reading table
> > > > [bigquery-public-data.world_bank_intl_debt.country_summary]
> > > >
> > > > I see 2 plan nodes together
> > > > 1. ColumnarToRow
> > > > 2. RowToVeloxColumnar
> > > >
> > > > This seems a little redundant apparently? Can someone help me why
> these
> > > > transitions are being added? I found out that ColumnarToRow is being
> > > added
> > > > since the Shuffle does not ouput a columnar output, but i also saw
> that
> > > > gluten code removes that rule intermittently while adding
> transitions.
> > > >
> > > > Any hints would help.
> > > > Thanks
> > > > Abbas
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to