Hi Jacques, Thanks for the tip. Your example is exactly what I am dealing with right now. Blocking conversion works and makes sure more queries run, but for the queries that don't work it changes the failure mode from "Beam doesn't support cartesian join" to "CannotPlanException... <not user friendly dump of planner state>".
I will try your suggestion of examining the plan in such a situation. Any example code that does this? Kenn On Mon, Jun 25, 2018 at 4:45 PM Jacques Nadeau <[email protected]> wrote: > My advice: block the transformation to a particular convention. Then, if > you get cannot plan, example the plan to determine if there are specific > problematic patterns. If there are, do a best guess of the particular > reason and return to user. This covers situations additional situations > that wouldn't work in syntax scraping, such as when a user writes this > query: > > select * from a,b > where a.id = b.id > > In this case, with the correct rules, this will get planned. However, a SQL > scrape would have said this was an invalid cartesian join potentially. > > > > On Wed, Jun 20, 2018 at 1:10 PM, Kenneth Knowles <[email protected]> > wrote: > > > Hi all, > > > > Bumping this again because I'd like to be quite sure the answer is > "Calcite > > doesn't support this". For example, I'd like to reject full cartesian > > joins. Currently, all joins can be converted to Beam convention and then > > there's some logic later to complain about cross joins. I would prefer to > > do this in the rule set, making a cross join just not convertible to Beam > > convention, to incentivize finding other plans, but still give a user a > > good error message. > > > > What do people actually do in this situation? Possibilities: (a) scrape > the > > syntax before planning, missing opportunities where a transformation > might > > end up with a viable plan (b) make an "ErrorRel" with impossibly high > cost > > so it will only be chosen as the last resort, somewhat like yacc error > > productions, could be hard to get a decent error message. I don't like > > these options, particularly. > > > > Kenn > > > > On Wed, May 30, 2018 at 6:10 AM Michael Mior <[email protected]> wrote: > > > > > Unfortunately, I'm not sure of the best way how to proceed from here, > but > > > it seems like you're making progress :) > > > -- > > > Michael Mior > > > [email protected] > > > > > > > > > > > > Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <[email protected]> > a > > > écrit : > > > > > > > Thanks Michael, > > > > > > > > I don't think that applies in our case - we aren't doing a table scan > > and > > > > having Calcite implement the rest, but are translating the whole plan > > to > > > a > > > > Beam pipeline to run on e.g. Flink, Spark, Dataflow. > > > > > > > > Here's an example: > > > > > > > > SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c']) > > > > > > > > With logical plan: > > > > > > > > LogicalProject(EXPR$0=[$0]) > > > > Uncollect > > > > LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')]) > > > > LogicalValues(tuples=[[{ 0 }]]) > > > > > > > > And the planner dumps "could not be implemented" when going for > Beam's > > > > calling convention. So I implement a rel & a rule. > > > > > > > > Then there's the corellated version exploding an array field from a > > > table: > > > > > > > > SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST > > > > (main.f_stringArr) AS arrElems(f_string) > > > > > > > > With logical plan: > > > > > > > > LogicalProject(f_int=[$0], f_string=[$2]) > > > > LogicalCorrelate(correlation=[$cor0], joinType=[inner], > > > > requiredColumns=[{1}]) > > > > BeamIOSourceRel(table=[[beam, main]]) > > > > Uncollect > > > > LogicalProject(f_stringArr=[$cor0.f_stringArr_1]) > > > > LogicalValues(tuples=[[{ 0 }]]) > > > > > > > > I hacked something together to support this, too. I did not fully > > > implement > > > > Correlate; I would love to reject unsupported things in a meaningful > > > way. I > > > > would like to have confidence that there are not other permutations > of > > > > logical plans that we missed. For example for joins we match all > joins > > > and > > > > translate them, then throw an error at a later stage. > > > > > > > > Incidentally, when I ran the decorrelation [1] it appeared to have no > > > > effect. We probably want to implement it directly in Beam anyhow in > > this > > > > case. > > > > > > > > Kenn > > > > > > > > [1] > > > > > > > > > > > https://calcite.apache.org/apidocs/org/apache/calcite/ > > sql2rel/SqlToRelConverter.html#decorrelate-org.apache. > > calcite.sql.SqlNode-org.apache.calcite.rel.RelNode- > > > > > > > > On Tue, May 22, 2018 at 6:39 PM Michael Mior <[email protected]> > > wrote: > > > > > > > > > For most queries, the only thing you should need to implement is a > > scan > > > > and > > > > > the rest can usually be implemented by Calcite. It would be good if > > you > > > > > have a specific example of a query that fails. > > > > > > > > > > -- > > > > > Michael Mior > > > > > [email protected] > > > > > > > > > > > > > > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles > <[email protected] > > > > > > a > > > > > écrit : > > > > > > > > > > > Bumping this, as it ended up in spam for some people. > > > > > > > > > > > > Kenn > > > > > > > > > > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <[email protected]> > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Beam SQL uses Calcite for parsing and (naive) planning. > Currently > > > it > > > > is > > > > > > > pretty easy to write a SQL query that parses and causes a > "could > > > not > > > > > > plan" > > > > > > > dump when we ask the planner to convert things to the Beam > > calling > > > > > > > convention. One current example is using UNNEST on a column to > > > yield > > > > a > > > > > > > LogicalCorrelate + Uncollect. > > > > > > > > > > > > > > There may obviously always be bits we don't support, but we'd > > like > > > to > > > > > > > ensure that the user encounters a careful error message rather > > > than a > > > > > > > planner dump. Is there a best practice for ensuring that we > have > > > > > covered > > > > > > > all the cases? Is it just "everything name Logical*" or is > there > > > > > > something > > > > > > > more clever? > > > > > > > > > > > > > > And if this question demonstrates that we are using Calcite > > > entirely > > > > > > > wrong, let us know :-) > > > > > > > > > > > > > > Kenn > > > > > > > > > > > > > > > > > > > > > > > > > > > >
