My advice: block the transformation to a particular convention. Then, if you get cannot plan, example the plan to determine if there are specific problematic patterns. If there are, do a best guess of the particular reason and return to user. This covers situations additional situations that wouldn't work in syntax scraping, such as when a user writes this query:
select * from a,b where a.id = b.id In this case, with the correct rules, this will get planned. However, a SQL scrape would have said this was an invalid cartesian join potentially. On Wed, Jun 20, 2018 at 1:10 PM, Kenneth Knowles <[email protected]> wrote: > Hi all, > > Bumping this again because I'd like to be quite sure the answer is "Calcite > doesn't support this". For example, I'd like to reject full cartesian > joins. Currently, all joins can be converted to Beam convention and then > there's some logic later to complain about cross joins. I would prefer to > do this in the rule set, making a cross join just not convertible to Beam > convention, to incentivize finding other plans, but still give a user a > good error message. > > What do people actually do in this situation? Possibilities: (a) scrape the > syntax before planning, missing opportunities where a transformation might > end up with a viable plan (b) make an "ErrorRel" with impossibly high cost > so it will only be chosen as the last resort, somewhat like yacc error > productions, could be hard to get a decent error message. I don't like > these options, particularly. > > Kenn > > On Wed, May 30, 2018 at 6:10 AM Michael Mior <[email protected]> wrote: > > > Unfortunately, I'm not sure of the best way how to proceed from here, but > > it seems like you're making progress :) > > -- > > Michael Mior > > [email protected] > > > > > > > > Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <[email protected]> a > > écrit : > > > > > Thanks Michael, > > > > > > I don't think that applies in our case - we aren't doing a table scan > and > > > having Calcite implement the rest, but are translating the whole plan > to > > a > > > Beam pipeline to run on e.g. Flink, Spark, Dataflow. > > > > > > Here's an example: > > > > > > SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c']) > > > > > > With logical plan: > > > > > > LogicalProject(EXPR$0=[$0]) > > > Uncollect > > > LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')]) > > > LogicalValues(tuples=[[{ 0 }]]) > > > > > > And the planner dumps "could not be implemented" when going for Beam's > > > calling convention. So I implement a rel & a rule. > > > > > > Then there's the corellated version exploding an array field from a > > table: > > > > > > SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST > > > (main.f_stringArr) AS arrElems(f_string) > > > > > > With logical plan: > > > > > > LogicalProject(f_int=[$0], f_string=[$2]) > > > LogicalCorrelate(correlation=[$cor0], joinType=[inner], > > > requiredColumns=[{1}]) > > > BeamIOSourceRel(table=[[beam, main]]) > > > Uncollect > > > LogicalProject(f_stringArr=[$cor0.f_stringArr_1]) > > > LogicalValues(tuples=[[{ 0 }]]) > > > > > > I hacked something together to support this, too. I did not fully > > implement > > > Correlate; I would love to reject unsupported things in a meaningful > > way. I > > > would like to have confidence that there are not other permutations of > > > logical plans that we missed. For example for joins we match all joins > > and > > > translate them, then throw an error at a later stage. > > > > > > Incidentally, when I ran the decorrelation [1] it appeared to have no > > > effect. We probably want to implement it directly in Beam anyhow in > this > > > case. > > > > > > Kenn > > > > > > [1] > > > > > > > > https://calcite.apache.org/apidocs/org/apache/calcite/ > sql2rel/SqlToRelConverter.html#decorrelate-org.apache. > calcite.sql.SqlNode-org.apache.calcite.rel.RelNode- > > > > > > On Tue, May 22, 2018 at 6:39 PM Michael Mior <[email protected]> > wrote: > > > > > > > For most queries, the only thing you should need to implement is a > scan > > > and > > > > the rest can usually be implemented by Calcite. It would be good if > you > > > > have a specific example of a query that fails. > > > > > > > > -- > > > > Michael Mior > > > > [email protected] > > > > > > > > > > > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <[email protected] > > > > a > > > > écrit : > > > > > > > > > Bumping this, as it ended up in spam for some people. > > > > > > > > > > Kenn > > > > > > > > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <[email protected]> > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Beam SQL uses Calcite for parsing and (naive) planning. Currently > > it > > > is > > > > > > pretty easy to write a SQL query that parses and causes a "could > > not > > > > > plan" > > > > > > dump when we ask the planner to convert things to the Beam > calling > > > > > > convention. One current example is using UNNEST on a column to > > yield > > > a > > > > > > LogicalCorrelate + Uncollect. > > > > > > > > > > > > There may obviously always be bits we don't support, but we'd > like > > to > > > > > > ensure that the user encounters a careful error message rather > > than a > > > > > > planner dump. Is there a best practice for ensuring that we have > > > > covered > > > > > > all the cases? Is it just "everything name Logical*" or is there > > > > > something > > > > > > more clever? > > > > > > > > > > > > And if this question demonstrates that we are using Calcite > > entirely > > > > > > wrong, let us know :-) > > > > > > > > > > > > Kenn > > > > > > > > > > > > > > > > > > > > >
