Hi, thank you so much for the tip! It is quite complicated to implement. The following join expression
SELECT * FROM emps JOIN depts ON emps.deptno = CASE WHEN ABS(depts.deptno) < 2 OR ABS(depts.deptno) >= 10 THEN ABS(depts.deptno) ELSE 1 END gets built into nested SqlBasicCall objects similar to this: SqlBasicCall ├── operator: "=" └── operandList: ├── SqlIdentifier │ └── value: emps.deptno └── SqlCase ├── whenList: │ └── SqlBasicCall (OR) │ ├── operator: OR (SqlBinaryOperator) │ └── operandList: │ ├── SqlBasicCall │ │ ├── operator: "<" (SqlBinaryOperator) │ │ └── operandList: │ │ ├── SqlBasicCall │ │ │ ├── operator: ABS │ │ │ └── operandList: │ │ │ └── SqlIdentifier │ │ │ └── value: depts.deptno │ │ └── SqlNumericLiteral │ │ └── value: 2 │ └── SqlBasicCall │ ├── operator: ">=" (SqlBinaryOperator) │ └── operandList: │ ├── SqlBasicCall │ │ ├── operator: ABS │ │ └── operandList: │ │ └── SqlIdentifier │ │ └── value: depts.deptno │ └── SqlNumericLiteral │ └── value: 10 ├── thenList: │ └── SqlBasicCall │ ├── operator: ABS │ └── operandList: │ └── SqlIdentifier │ └── value: depts.deptno └── elseExpr: └── SqlNumericLiteral └── value: 1 Writing a visitor looks like it would need to be coded to handle every scenario/edge case that a modeller might want to define in the join expression. ```java public static class SqlNodeShuttleImpl extends SqlShuttle { @Override public SqlNode visit(SqlCall sqlNode) { System.out.println("Visiting SqlCall: " + sqlNode.getOperator()); return super.visit(sqlNode); } @Override public SqlNode visit(SqlIdentifier sqlNode) { System.out.println("Visiting SqlIdentifier: " + sqlNode); return super.visit(sqlNode); } @Override public SqlNode visit(SqlDynamicParam sqlNode) { System.out.println("Visiting SqlDynamicParam: " + sqlNode); return super.visit(sqlNode); } @Override public SqlNode visit(SqlLiteral sqlNode) { System.out.println("Visiting SqlLiteral: " + sqlNode); return super.visit(sqlNode); } @Override public SqlNode visit(SqlNodeList nodeList) { // ignore this visit and delegate to super //System.out.println("Visiting SqlNodeList: " + nodeList); return super.visit(nodeList); } } ``` It outputs as follows: Visiting SqlCall: = Visiting SqlIdentifier: emps.deptno Visiting SqlCall: CASE Visiting SqlCall: OR Visiting SqlCall: < Visiting SqlCall: ABS Visiting SqlIdentifier: depts.deptno Visiting SqlLiteral: 2 Visiting SqlCall: >= Visiting SqlCall: ABS Visiting SqlIdentifier: depts.deptno Visiting SqlLiteral: 10 Visiting SqlCall: ABS Visiting SqlIdentifier: depts.deptno Visiting SqlLiteral: 1 From, Hugh Pearse On Thu, 26 Jun 2025 at 08:43, Stamatis Zampetakis <zabe...@gmail.com> wrote: > If your model decomposes your join expression in let's say three parts > (columnLeft, operator, columnRight) then you could directly call the > appropriate RelBuilder API and if the user provided valid inputs then > everything succeeds otherwise you get an informative error (e.g., > column does not exist in the table). > > Another alternative, would be to use only the SqlParser for the join > expression that will give you a small SqlNode tree and then have a > basic visitor (mini SqlToRelConverter) with the RelBuilder (that has > all the context and tables) where you call the appropriate methods > (builder.field, builder.field, builder.call, etc.). > > Best, > Stamatis > > On Thu, Jun 26, 2025 at 9:07 AM Hugh Pearse <hughpea...@gmail.com> wrote: > > > > Hi > > Our data analysts are using data models for data exploration and can be > > interested in a subgraph of the snowflake. So they specify the dimensions > > there are interested in selecting, and we build the graph of joins on the > > fly. The analysts have requested me to provide support for non-equi > joins. > > > > The process works as follows > > 1. The tool pushes the fact table to the relBuilder > > 2. The analyst selects a dimension on another table > > 3. The tool scans the new table and pushes it to the relBuilder > > 4. The join type is coded in the model (left, right, inner) > > 5. The join expression is coded in the model (eg: on x=y) > > > > We have been supporting equi-joins in the yaml of the model definition > and > > telling the relBuilder to use fieldA and fieldB, but this is only basic. > > > > Summary: > > Not all tables in the snowflake are used in every analysis. > > > > From, > > Hugh Pearse > > > > > > > > > > On Thu, 26 Jun 2025, 00:22 Mihai Budiu, <mbu...@gmail.com> wrote: > > > > > Why not create a full SQL query? > > > > > > You need to know all the tables involved anyway, without their schemas > you > > > cannot expect to validate anything. So you have enough information to > > > create a full query, and then pick just the pieces that you need. > > > > > > Mihai > > > ________________________________ > > > From: Hugh Pearse <hughpea...@gmail.com> > > > Sent: Wednesday, June 25, 2025 4:16 PM > > > To: dev@calcite.apache.org <dev@calcite.apache.org> > > > Subject: Re: SqlParser.parseExpression() > > > > > > Hi, > > > To answer your question: > > > "what do you want to do with the resulting data structure?" > > > > > > Answer: > > > I am writing an application which allows data modellers to specify a > > > snowflake schema using yaml files. The center of the snowflake is the > > > "from" clause in SQL. This fact table is pushed on to the relBuilder. > > > Potential table join paths are defined in the yaml file. When building > a > > > with scans are incrementally pushed on top on the stack in the > relBuilder. > > > Then a join expression is input from the yaml file (including the "on" > > > condition) and pushed onto the top of the stack in the relBuilder. This > > > process repeats incrementally to build a complete snowflake. > > > > > > But to simplify the problem statement, I would simply say: > > > > > > I want to parse: > > > JOIN depts ON emps.deptno = depts.deptno > > > > > > > > > From, > > > Hugh Pearse > > > > > > > > > > > > > > > On Wed, 25 Jun 2025, 22:36 Mihai Budiu, <mbu...@gmail.com> wrote: > > > > > > > I don't know if relBuilder has an API to add a complete expression > to a > > > > partial tree under construction, but if it does, I don't see why this > > > > wouldn't be possible. > > > > > > > > The next question is "what do you want to do with the resulting data > > > > structure?" > > > > > > > > Calcite will be able to use the existing data structure if it > conforms to > > > > its own invariants. > > > > > > > > To give you a concrete example, in our compiler we have implemented > > > > support for user-defined functions written in SQL. You can write: > > > > > > > > CREATE FUNCTION F(x INTEGER) RETURNS INTEGER AS x % 2; > > > > > > > > This is not something that Calcite understands. But we modified the > > > parser > > > > to accept this SQL; that's easy. > > > > > > > > After parsing, we convert this to the following SQL: > > > > > > > > CREATE TABLE T(x INTEGER); > > > > SELECT x % 2 FROM T; > > > > > > > > we use calcite to compile this using a regular flow, including > validation > > > > and conversion to Rel. > > > > Then from the produced Rel data structure we extract the > implementation > > > of > > > > the query and use it to implement the body of the function, using a > > > > standard Rel visitor we wrote (almost exactly the same visitor we > used to > > > > convert Rel to our own IR). > > > > > > > > This sounds similar to the stuff you want to do. > > > > > > > > So, in summary, these are in the end just data structures. Calcite > will > > > > manipulate successfully data structures that respect some > invariants. If > > > > the operations you create will build data structures that conform to > > > these > > > > invariants, then probably you can use tools from Calcite to > manipulate > > > them. > > > > > > > > Mihai > > > > > > > > ________________________________ > > > > From: Hugh Pearse <hughpea...@gmail.com> > > > > Sent: Wednesday, June 25, 2025 2:05 PM > > > > To: dev@calcite.apache.org <dev@calcite.apache.org> > > > > Subject: Re: SqlParser.parseExpression() > > > > > > > > Hi, thank you so much for reviewing my question 🙇♂️ > > > > > > > > I thought maybe I was ignorant, and maybe there was a secret way to > > > inject > > > > context using Contexts.of(Object...). Judging from your response > this is > > > > not possible. > > > > > > > > I am determined to solve this puzzle. As an alternative do you think > it > > > > would be possible to parse a statement like this: > > > > > > > > SELECT * FROM emps > > > > JOIN depts ON emps.deptno = depts.deptno > > > > > > > > And then extract this portion > > > > JOIN depts ON emps.deptno = depts.deptno > > > > > > > > And add it on top of an existing relBuilder? > > > > > > > > Thank you so much for your help > > > > > > > > Kindest regards > > > > From, > > > > Hugh Pearse > > > > > > > > > > > > > > > > > > > > On Wed, 25 Jun 2025, 19:46 Mihai Budiu, <mbu...@gmail.com> wrote: > > > > > > > > > In general you pass to the validator a SQL query; the validator > will > > > make > > > > > sure the query is fine, and it will rewrite it to make it more > > > explicit. > > > > > This entails for example resolving identifiers: what is "deptno" - > a > > > > table, > > > > > a column, and if it's a column, what table does it belong to? > > > > > > > > > > For your expression with "free" variables the validator does not > have > > > > > enough information to understand what these identifiers refer to. > > > > > > > > > > Even if the validator was enhanced to "understand" such > expressions, > > > the > > > > > validator cannot give them a "meaning". When you say "SELECT deptno > > > FROM > > > > > emps", deptno implicitly iterates over all rows in the emps table, > and > > > > the > > > > > meaning of this statement is to produce a table from all results > > > produced > > > > > by iterating. > > > > > > > > > > What is the meaning of your expression? What would you expect the > > > > > validator to produce if it worked? > > > > > > > > > > Mihai > > > > > > > > > > ________________________________ > > > > > From: Hugh Pearse <hughpea...@gmail.com> > > > > > Sent: Wednesday, June 25, 2025 11:30 AM > > > > > To: dev@calcite.apache.org <dev@calcite.apache.org> > > > > > Subject: SqlParser.parseExpression() > > > > > > > > > > Hi team, > > > > > I am trying to join 2 tables, but want to provide the join > condition > > > > (join > > > > > ON x=y) as a text input which is parsed. > > > > > > > > > > Example scenario: > > > > > > > > > > 1. user provides input: > > > > > emps.deptno = depts.deptno > > > > > 2. calcite parses join expression and creates join > > > > > > > > > > Expected result > > > > > SELECT * FROM emps JOIN depts ON emps.deptno = depts.deptno > > > > > > > > > > Actual result: > > > > > I am facing error: > > > > > Table 'emps' not found > > > > > at > > > > > > > > > > > > > > > > > > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > > > > Method) > > > > > at > > > > > > > > > > > > > > > > > > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) > > > > > at > > > > > > > > > > > > > > > > > > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > > > > at > > > > > > > > > > > > > > > > > > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) > > > > > at > > > > > > > > > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) > > > > > at > > > > > > > > > > > > > org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:507) > > > > > at > org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:948) > > > > > at > org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:933) > > > > > at > > > > > > > > > > > > > > > > > > org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5643) > > > > > at > > > > > > > > > > > > > > > > > > org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:364) > > > > > at > > > > > > > > > > > > > > > > > > org.apache.calcite.sql.validate.SqlValidatorImpl.validateIdentifier(SqlValidatorImpl.java:3348) > > > > > at > > > > > > > > > org.apache.calcite.sql.SqlIdentifier.validateExpr(SqlIdentifier.java:307) > > > > > at > > > org.apache.calcite.sql.SqlOperator.validateCall(SqlOperator.java:475) > > > > > at > > > > > > > > > > > > > > > > > > org.apache.calcite.sql.validate.SqlValidatorImpl.validateCall(SqlValidatorImpl.java:6371) > > > > > at org.apache.calcite.sql.SqlCall.validate(SqlCall.java:143) > > > > > at > > > > > > > > > > > > > > > > > > org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1101) > > > > > at > > > > > > > > > > > > > > > > > > org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:807) > > > > > > > > > > > > > > > Code: > > > > > > > > > > ```java > > > > > import com.google.common.collect.ImmutableList; > > > > > import org.apache.calcite.adapter.java.ReflectiveSchema; > > > > > import org.apache.calcite.config.Lex; > > > > > import org.apache.calcite.jdbc.CalciteConnection; > > > > > import org.apache.calcite.jdbc.CalciteSchema; > > > > > import org.apache.calcite.plan.RelOptCluster; > > > > > import org.apache.calcite.plan.ViewExpanders; > > > > > import org.apache.calcite.prepare.CalciteCatalogReader; > > > > > import org.apache.calcite.rel.type.RelDataTypeFactory; > > > > > import org.apache.calcite.rex.RexBuilder; > > > > > import org.apache.calcite.rex.RexNode; > > > > > import org.apache.calcite.schema.SchemaPlus; > > > > > import org.apache.calcite.sql.SqlNode; > > > > > import org.apache.calcite.sql.fun.SqlStdOperatorTable; > > > > > import org.apache.calcite.sql.parser.SqlParser; > > > > > import org.apache.calcite.sql.validate.SqlValidator; > > > > > import org.apache.calcite.sql.validate.SqlValidatorUtil; > > > > > import org.apache.calcite.sql2rel.SqlToRelConverter; > > > > > import org.apache.calcite.tools.FrameworkConfig; > > > > > import org.apache.calcite.tools.Frameworks; > > > > > import org.apache.calcite.tools.RelBuilder; > > > > > import org.junit.jupiter.api.Test; > > > > > > > > > > import java.sql.Connection; > > > > > import java.sql.DriverManager; > > > > > import java.util.Properties; > > > > > > > > > > public class JoinTest { > > > > > > > > > > public static class HrSchema { > > > > > public final Employee[] emps = { > > > > > new Employee(100, "Alice", 10), > > > > > new Employee(200, "Bob", 20) > > > > > }; > > > > > public final Department[] depts = { > > > > > new Department(10, "Sales"), > > > > > new Department(20, "Engineering") > > > > > }; > > > > > } > > > > > > > > > > public static class Employee { > > > > > public final int empid; > > > > > public final String name; > > > > > public final int deptno; > > > > > > > > > > public Employee(int empid, String name, int deptno) { > > > > > this.empid = empid; > > > > > this.name = name; > > > > > this.deptno = deptno; > > > > > } > > > > > } > > > > > > > > > > public static class Department { > > > > > public final int deptno; > > > > > public final String name; > > > > > > > > > > public Department(int deptno, String name) { > > > > > this.deptno = deptno; > > > > > this.name = name; > > > > > } > > > > > } > > > > > > > > > > @Test > > > > > public void testJoinToRexNode() throws Exception { > > > > > Properties props = new Properties(); > > > > > Connection connection = > > > > > DriverManager.getConnection("jdbc:calcite:", props); > > > > > CalciteConnection calciteConn = > > > > > connection.unwrap(CalciteConnection.class); > > > > > SchemaPlus rootSchema = calciteConn.getRootSchema(); > > > > > rootSchema.add("hr", new ReflectiveSchema(new HrSchema())); > > > > > calciteConn.setSchema("hr"); > > > > > > > > > > FrameworkConfig config = Frameworks.newConfigBuilder() > > > > > > .parserConfig(SqlParser.config().withLex(Lex.MYSQL)) > > > > > .defaultSchema(rootSchema.getSubSchema("hr")) > > > > > .build(); > > > > > > > > > > String sql = "SELECT * FROM emps JOIN depts ON emps.deptno > = > > > > > depts.deptno"; > > > > > > > > > > SqlValidator.Config sqlValidatorConfig = > > > > > config.getSqlValidatorConfig(); > > > > > SqlParser.Config parserConfig = config.getParserConfig(); > > > > > > > > > > ImmutableList<String> defaultSchemaPath = > ImmutableList.of(); > > > > > RelBuilder relBuilder = RelBuilder.create(config); > > > > > RelOptCluster cluster = relBuilder.getCluster(); > > > > > RexBuilder rexBuilder = relBuilder.getRexBuilder(); > > > > > RelDataTypeFactory typeFactory = > rexBuilder.getTypeFactory(); > > > > > CalciteSchema calciteSchema = > > > > > CalciteSchema.from(rootSchema.getSubSchema("hr")); > > > > > CalciteCatalogReader calciteCatalogReader = new > > > > > CalciteCatalogReader( > > > > > calciteSchema, > > > > > defaultSchemaPath, > > > > > typeFactory, > > > > > calciteConn.config() > > > > > ); > > > > > > > > > > SqlValidator sqlValidator = SqlValidatorUtil.newValidator( > > > > > SqlStdOperatorTable.instance(), > > > > > calciteCatalogReader, > > > > > typeFactory, > > > > > sqlValidatorConfig > > > > > ); > > > > > > > > > > SqlToRelConverter sqlToRelConverter = new > SqlToRelConverter( > > > > > ViewExpanders.simpleContext(cluster), > > > > > sqlValidator, > > > > > calciteCatalogReader, > > > > > cluster, > > > > > config.getConvertletTable(), > > > > > config.getSqlToRelConverterConfig() > > > > > ); > > > > > > > > > > // First test (works) > > > > > SqlParser firstParser = SqlParser.create(sql, > parserConfig); > > > > > SqlNode firstSqlNode = firstParser.parseQuery(); > > > > > SqlNode firstValidatedSqlNode = > > > > > sqlValidator.validate(firstSqlNode); > > > > > RexNode firstRexNode = > > > > > sqlToRelConverter.convertExpression(firstValidatedSqlNode); > > > > > System.out.println(firstRexNode); > > > > > > > > > > // Second test (fails) > > > > > sql = "emps.deptno = depts.deptno"; > > > > > SqlParser secondParser = SqlParser.create(sql, > parserConfig); > > > > > SqlNode secondSqlNode = secondParser.parseExpression(); > > > > > SqlNode secondValidatedSqlNode = > > > > > sqlValidator.validate(secondSqlNode); > > > > > RexNode secondRexNode = > > > > > sqlToRelConverter.convertExpression(secondValidatedSqlNode); > > > > > } > > > > > } > > > > > ``` > > > > > > > > > > error is raised at this line: > > > > > SqlNode secondValidatedSqlNode = > sqlValidator.validate(secondSqlNode); > > > > > > > > > > my questions are, > > > > > 1. is this type of expression even supported? > > > > > 2. its a bug with calcite or a bug with my code? > > > > > > > > > > From, > > > > > Hugh Pearse > > > > > > > > > > > > >