Thanks for the clarification! For the purposes of avoiding the C2R2C, I landed up adding my own BatchType and ConvFunc so that I could convert my BigQuery BatchScan (backed up by Arrow datastructures) to a Velox ColumnarBatch.
On Wed, Jun 4, 2025 at 10:30 PM Hongze Zhang <[email protected]> wrote: > > Now i know this class i.e *ApplyColumnarRulesAndInsertTransitions* > doesn't > matter in the context of my question, since gluten anyways calls > *RemoveTransitions* later > > Yes, Gluten will call RemoveTransitions right away when the columnar > query optimization starts, then the subsequent columnar rules could > see a cleaner query plan that is easier to optimize. After the > columnar rules are executed, all the needed transitions will be added > again by `InsertTransitions`[1] in one go, the C2R2C transition you > mentioned is added by this rule either. > > Hongze > > [1] > https://github.com/apache/incubator-gluten/blob/eda660b572c78a8aaf5ea0f9d217e5d0ca6340c7/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxRuleApi.scala#L106 > > On Tue, Jun 3, 2025 at 3:51 PM Abbas Gadhia <[email protected]> > wrote: > > > > Hi Hongze, > > > > Spark-to-Velox C2C is not supported yet > > > > > > Thanks for clarifying this. Makes sense now :) > > > > I don't clearly get the issue here. Would you give an example? > > > > > > Apologies if I wasn't clear. I was referring to the Spark > > *ApplyColumnarRulesAndInsertTransitions* rule that adds a *ColumnarToRow* > > node by looking at the "supportsColumnar" boolean of the Shuffle node. > > Now i know this class i.e *ApplyColumnarRulesAndInsertTransitions* > doesn't > > matter in the context of my question, since gluten anyways calls > > *RemoveTransitions* later > > > > Thanks much for the clarification again! > > Regds > > Abbas > > > > On Tue, Jun 3, 2025 at 7:14 PM Hongze Zhang <[email protected]> wrote: > > > > > Hi Abbas, > > > > > > > This seems a little redundant apparently? > > > > > > This is actually a C2R2C transition used to convert from vanilla > > > Spark's columnar format to Velox's. It's necessary because > > > Spark-to-Velox C2C is not supported yet. > > > > > > > I found out that ColumnarToRow is being added since the Shuffle does > not > > > ouput a columnar output, but i also saw that gluten code removes that > rule > > > intermittently while adding transitions. > > > > > > I don't clearly get the issue here. Would you give an example? > > > > > > Best, > > > Hongze > > > > > > On Tue, Jun 3, 2025 at 11:52 AM Abbas Gadhia > > > <[email protected]> wrote: > > > > > > > > Hello, > > > > I have a plan that looks like this > > > > > > > > HashAggregateTransformer(keys=[country_code#0], > > > > functions=[sum(latest_trade_data#29L), > avg(latest_industrial_data#28L)], > > > > isStreamingAgg=false, output=[country_code#0, > sum(latest_trade_data)#95L, > > > > avg(latest_industrial_data)#96]) > > > > +- AQEShuffleRead coalesced > > > > +- ShuffleQueryStage 0 > > > > +- Exchange hashpartitioning(country_code#0, 5), > ENSURE_REQUIREMENTS, > > > > [plan_id=668] > > > > +- VeloxColumnarToRow > > > > +- ^(1) FlushableHashAggregateTransformer(keys=[country_code#0], > > > > functions=[partial_sum(latest_trade_data#29L), > > > > partial_avg(latest_industrial_data#28L)], isStreamingAgg=false, > > > > output=[country_code#0, sum#107L, sum#108, count#109L]) > > > > +- ^(1) ProjectExecTransformer [country_code#0, > > > > latest_industrial_data#28L, latest_trade_data#29L] > > > > +- ^(1) FilterExecTransformer (trim(short_name#1, None) = Low > > > income) > > > > +- ^(1) InputIteratorTransformer[columns...] > > > > > > > > * +- RowToVeloxColumnar +- *(1) ColumnarToRow* > > > > +- BatchScan country_summary[columns...] Reading table > > > > [bigquery-public-data.world_bank_intl_debt.country_summary] > > > > > > > > I see 2 plan nodes together > > > > 1. ColumnarToRow > > > > 2. RowToVeloxColumnar > > > > > > > > This seems a little redundant apparently? Can someone help me why > these > > > > transitions are being added? I found out that ColumnarToRow is being > > > added > > > > since the Shuffle does not ouput a columnar output, but i also saw > that > > > > gluten code removes that rule intermittently while adding > transitions. > > > > > > > > Any hints would help. > > > > Thanks > > > > Abbas > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
