[GitHub] [arrow] westonpace commented on pull request #34711: GH-15280: [C++][Python][GLib] add libarrow_acero containing everything previously in compute/exec

via GitHub Fri, 31 Mar 2023 11:44:30 -0700


westonpace commented on PR #34711:
URL: https://github.com/apache/arrow/pull/34711#issuecomment-1492437414


   > What is the relationship between arrow_substrait and arrow_acero? Part of 
me wonders why substrait makes sense as a separate library rather than optional 
feature of acero? For example, if Gandiva got a substrait consumer (taking 
expressions), would that go in arrow_substrait or gandiva?
   
   I think @icexelloss 's answer is good (it causes a circular dependency where 
today we have substrait->dataset, dataset->acero, substrait->acero and merged 
we would have acero->dataset and dataset->acero) and I agree we definitely want 
to leave things here for a separate PR.  However, I will also make (at least 
some) argument for leaving things separate in general.
   
   > if Gandiva got a substrait consumer (taking expressions), would that go in 
arrow_substrait or gandiva?
   
   I think it could be possible that gandiva reuses some of what we have done 
here.  However, probably not the most likely route.  Moving common code into 
substrait-cpp would be more likely I think.
   
   The original motivation was actually because there were two different IR 
proposals at the time.  So it wasn't to have one IR and two engines but rather 
to have one engine and two IRs.  Now that Substrait is further along I think 
this is probably unlikely (e.g. if someone wanted to add SQL->Acero then I 
would probably suggest SQL->Substrait->Acero).
   
   So these are not great motivations for splitting something into two modules 
but they could perhaps be motivation to keep them separate.  Finally, from a 
code perspective, it is cleaner to have these hard layers and prevents us from 
intermingling in the future.
   
   I am very much in favor of ENGINE->SUBSTRAIT though (again, in a follow-up 
PR).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on pull request #34711: GH-15280: [C++][Python][GLib] add libarrow_acero containing everything previously in compute/exec

Reply via email to