Re: Unnecessary Row/Columnar conversions?

Abbas Gadhia Tue, 03 Jun 2025 07:51:51 -0700

Hi Hongze,

Spark-to-Velox C2C is not supported yet



Thanks for clarifying this. Makes sense now :)

I don't clearly get the issue here. Would you give an example?


Apologies if I wasn't clear. I was referring to the Spark
*ApplyColumnarRulesAndInsertTransitions* rule that adds a *ColumnarToRow*
node by looking at the "supportsColumnar" boolean of the Shuffle node.
Now i know this class i.e *ApplyColumnarRulesAndInsertTransitions* doesn't
matter in the context of my question, since gluten anyways calls
*RemoveTransitions* later

Thanks much for the clarification again!
Regds
Abbas

On Tue, Jun 3, 2025 at 7:14 PM Hongze Zhang <[email protected]> wrote:

> Hi Abbas,
>
> > This seems a little redundant apparently?
>
> This is actually a C2R2C transition used to convert from vanilla
> Spark's columnar format to Velox's. It's necessary because
> Spark-to-Velox C2C is not supported yet.
>
> > I found out that ColumnarToRow is being added since the Shuffle does not
> ouput a columnar output, but i also saw that gluten code removes that rule
> intermittently while adding transitions.
>
> I don't clearly get the issue here. Would you give an example?
>
> Best,
> Hongze
>
> On Tue, Jun 3, 2025 at 11:52 AM Abbas Gadhia
> <[email protected]> wrote:
> >
> > Hello,
> > I have a plan that looks like this
> >
> > HashAggregateTransformer(keys=[country_code#0],
> > functions=[sum(latest_trade_data#29L), avg(latest_industrial_data#28L)],
> > isStreamingAgg=false, output=[country_code#0, sum(latest_trade_data)#95L,
> > avg(latest_industrial_data)#96])
> > +- AQEShuffleRead coalesced
> >  +- ShuffleQueryStage 0
> >   +- Exchange hashpartitioning(country_code#0, 5), ENSURE_REQUIREMENTS,
> > [plan_id=668]
> >    +- VeloxColumnarToRow
> >     +- ^(1) FlushableHashAggregateTransformer(keys=[country_code#0],
> > functions=[partial_sum(latest_trade_data#29L),
> > partial_avg(latest_industrial_data#28L)], isStreamingAgg=false,
> > output=[country_code#0, sum#107L, sum#108, count#109L])
> >      +- ^(1) ProjectExecTransformer [country_code#0,
> > latest_industrial_data#28L, latest_trade_data#29L]
> >       +- ^(1) FilterExecTransformer (trim(short_name#1, None) = Low
> income)
> >        +- ^(1) InputIteratorTransformer[columns...]
> >
> > *        +- RowToVeloxColumnar         +- *(1) ColumnarToRow*
> >           +- BatchScan country_summary[columns...] Reading table
> > [bigquery-public-data.world_bank_intl_debt.country_summary]
> >
> > I see 2 plan nodes together
> > 1. ColumnarToRow
> > 2. RowToVeloxColumnar
> >
> > This seems a little redundant apparently? Can someone help me why these
> > transitions are being added? I found out that ColumnarToRow is being
> added
> > since the Shuffle does not ouput a columnar output, but i also saw that
> > gluten code removes that rule intermittently while adding transitions.
> >
> > Any hints would help.
> > Thanks
> > Abbas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Unnecessary Row/Columnar conversions?

Reply via email to