Hi Hongze, Spark-to-Velox C2C is not supported yet
Thanks for clarifying this. Makes sense now :) I don't clearly get the issue here. Would you give an example? Apologies if I wasn't clear. I was referring to the Spark *ApplyColumnarRulesAndInsertTransitions* rule that adds a *ColumnarToRow* node by looking at the "supportsColumnar" boolean of the Shuffle node. Now i know this class i.e *ApplyColumnarRulesAndInsertTransitions* doesn't matter in the context of my question, since gluten anyways calls *RemoveTransitions* later Thanks much for the clarification again! Regds Abbas On Tue, Jun 3, 2025 at 7:14 PM Hongze Zhang <[email protected]> wrote: > Hi Abbas, > > > This seems a little redundant apparently? > > This is actually a C2R2C transition used to convert from vanilla > Spark's columnar format to Velox's. It's necessary because > Spark-to-Velox C2C is not supported yet. > > > I found out that ColumnarToRow is being added since the Shuffle does not > ouput a columnar output, but i also saw that gluten code removes that rule > intermittently while adding transitions. > > I don't clearly get the issue here. Would you give an example? > > Best, > Hongze > > On Tue, Jun 3, 2025 at 11:52 AM Abbas Gadhia > <[email protected]> wrote: > > > > Hello, > > I have a plan that looks like this > > > > HashAggregateTransformer(keys=[country_code#0], > > functions=[sum(latest_trade_data#29L), avg(latest_industrial_data#28L)], > > isStreamingAgg=false, output=[country_code#0, sum(latest_trade_data)#95L, > > avg(latest_industrial_data)#96]) > > +- AQEShuffleRead coalesced > > +- ShuffleQueryStage 0 > > +- Exchange hashpartitioning(country_code#0, 5), ENSURE_REQUIREMENTS, > > [plan_id=668] > > +- VeloxColumnarToRow > > +- ^(1) FlushableHashAggregateTransformer(keys=[country_code#0], > > functions=[partial_sum(latest_trade_data#29L), > > partial_avg(latest_industrial_data#28L)], isStreamingAgg=false, > > output=[country_code#0, sum#107L, sum#108, count#109L]) > > +- ^(1) ProjectExecTransformer [country_code#0, > > latest_industrial_data#28L, latest_trade_data#29L] > > +- ^(1) FilterExecTransformer (trim(short_name#1, None) = Low > income) > > +- ^(1) InputIteratorTransformer[columns...] > > > > * +- RowToVeloxColumnar +- *(1) ColumnarToRow* > > +- BatchScan country_summary[columns...] Reading table > > [bigquery-public-data.world_bank_intl_debt.country_summary] > > > > I see 2 plan nodes together > > 1. ColumnarToRow > > 2. RowToVeloxColumnar > > > > This seems a little redundant apparently? Can someone help me why these > > transitions are being added? I found out that ColumnarToRow is being > added > > since the Shuffle does not ouput a columnar output, but i also saw that > > gluten code removes that rule intermittently while adding transitions. > > > > Any hints would help. > > Thanks > > Abbas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
