westonpace commented on PR #34711: URL: https://github.com/apache/arrow/pull/34711#issuecomment-1492437414
> What is the relationship between arrow_substrait and arrow_acero? Part of me wonders why substrait makes sense as a separate library rather than optional feature of acero? For example, if Gandiva got a substrait consumer (taking expressions), would that go in arrow_substrait or gandiva? I think @icexelloss 's answer is good (it causes a circular dependency where today we have substrait->dataset, dataset->acero, substrait->acero and merged we would have acero->dataset and dataset->acero) and I agree we definitely want to leave things here for a separate PR. However, I will also make (at least some) argument for leaving things separate in general. > if Gandiva got a substrait consumer (taking expressions), would that go in arrow_substrait or gandiva? I think it could be possible that gandiva reuses some of what we have done here. However, probably not the most likely route. Moving common code into substrait-cpp would be more likely I think. The original motivation was actually because there were two different IR proposals at the time. So it wasn't to have one IR and two engines but rather to have one engine and two IRs. Now that Substrait is further along I think this is probably unlikely (e.g. if someone wanted to add SQL->Acero then I would probably suggest SQL->Substrait->Acero). So these are not great motivations for splitting something into two modules but they could perhaps be motivation to keep them separate. Finally, from a code perspective, it is cleaner to have these hard layers and prevents us from intermingling in the future. I am very much in favor of ENGINE->SUBSTRAIT though (again, in a follow-up PR). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
