Hi
Our data analysts are using data models for data exploration and can be
interested in a subgraph of the snowflake. So they specify the dimensions
there are interested in selecting, and we build the graph of joins on the
fly. The analysts have requested me to provide support for non-equi joins.

The process works as follows
1. The tool pushes the fact table to the relBuilder
2. The analyst selects a dimension on another table
3. The tool scans the new table and pushes it to the relBuilder
4. The join type is coded in the model (left, right, inner)
5. The join expression is coded in the model (eg: on x=y)

We have been supporting equi-joins in the yaml of the model definition and
telling the relBuilder to use fieldA and fieldB, but this is only basic.

Summary:
Not all tables in the snowflake are used in every analysis.

From,
Hugh Pearse




On Thu, 26 Jun 2025, 00:22 Mihai Budiu, <mbu...@gmail.com> wrote:

> Why not create a full SQL query?
>
> You need to know all the tables involved anyway, without their schemas you
> cannot expect to validate anything. So you have enough information to
> create a full query, and then pick just the pieces that you need.
>
> Mihai
> ________________________________
> From: Hugh Pearse <hughpea...@gmail.com>
> Sent: Wednesday, June 25, 2025 4:16 PM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Re: SqlParser.parseExpression()
>
> Hi,
> To answer your question:
> "what do you want to do with the resulting data structure?"
>
> Answer:
> I am writing an application which allows data modellers to specify a
> snowflake schema using yaml files. The center of the snowflake is the
> "from" clause in SQL. This fact table is pushed on to the relBuilder.
> Potential table join paths are defined in the yaml file. When building a
> with scans are incrementally pushed on top on the stack in the relBuilder.
> Then a join expression is input from the yaml file (including the "on"
> condition) and pushed onto the top of the stack in the relBuilder. This
> process repeats incrementally to build a complete snowflake.
>
> But to simplify the problem statement, I would simply say:
>
> I want to parse:
>    JOIN depts ON emps.deptno = depts.deptno
>
>
> From,
> Hugh Pearse
>
>
>
>
> On Wed, 25 Jun 2025, 22:36 Mihai Budiu, <mbu...@gmail.com> wrote:
>
> > I don't know if relBuilder has an API to add a complete expression to a
> > partial tree under construction, but if it does, I don't see why this
> > wouldn't be possible.
> >
> > The next question is "what do you want to do with the resulting data
> > structure?"
> >
> > Calcite will be able to use the existing data structure if it conforms to
> > its own invariants.
> >
> > To give you a concrete example, in our compiler we have implemented
> > support for user-defined functions written in SQL. You can write:
> >
> > CREATE FUNCTION F(x INTEGER) RETURNS INTEGER AS x % 2;
> >
> > This is not something that Calcite understands. But we modified the
> parser
> > to accept this SQL; that's easy.
> >
> > After parsing, we convert this to the following SQL:
> >
> > CREATE TABLE T(x INTEGER);
> > SELECT x % 2 FROM T;
> >
> > we use calcite to compile this using a regular flow, including validation
> > and conversion to Rel.
> > Then from the produced Rel data structure we extract the implementation
> of
> > the query and use it to implement the body of the function, using a
> > standard Rel visitor we wrote (almost exactly the same visitor we used to
> > convert Rel to our own IR).
> >
> > This sounds similar to the stuff you want to do.
> >
> > So, in summary, these are in the end just data structures. Calcite will
> > manipulate successfully data structures that respect some invariants. If
> > the operations you create will build data structures that conform to
> these
> > invariants, then probably you can use tools from Calcite to manipulate
> them.
> >
> > Mihai
> >
> > ________________________________
> > From: Hugh Pearse <hughpea...@gmail.com>
> > Sent: Wednesday, June 25, 2025 2:05 PM
> > To: dev@calcite.apache.org <dev@calcite.apache.org>
> > Subject: Re: SqlParser.parseExpression()
> >
> > Hi, thank you so much for reviewing my question 🙇‍♂️
> >
> > I thought maybe I was ignorant, and maybe there was a secret way to
> inject
> > context using Contexts.of(Object...). Judging from your response this is
> > not possible.
> >
> > I am determined to solve this puzzle. As an alternative do you think it
> > would be possible to parse a statement like this:
> >
> >     SELECT * FROM emps
> >     JOIN depts ON emps.deptno = depts.deptno
> >
> > And then extract this portion
> >     JOIN depts ON emps.deptno = depts.deptno
> >
> > And add it on top of an existing relBuilder?
> >
> > Thank you so much for your help
> >
> > Kindest regards
> > From,
> > Hugh Pearse
> >
> >
> >
> >
> > On Wed, 25 Jun 2025, 19:46 Mihai Budiu, <mbu...@gmail.com> wrote:
> >
> > > In general you pass to the validator a SQL query; the validator will
> make
> > > sure the query is fine, and it will rewrite it to make it more
> explicit.
> > > This entails for example resolving identifiers: what is "deptno" - a
> > table,
> > > a column, and if it's a column, what table does it belong to?
> > >
> > > For your expression with "free" variables the validator does not have
> > > enough information to understand what these identifiers refer to.
> > >
> > > Even if the validator was enhanced to "understand" such expressions,
> the
> > > validator cannot give them a "meaning". When you say "SELECT deptno
> FROM
> > > emps", deptno implicitly iterates over all rows in the emps table, and
> > the
> > > meaning of this statement is to produce a table from all results
> produced
> > > by iterating.
> > >
> > > What is the meaning of your expression? What would you expect the
> > > validator to produce if it worked?
> > >
> > > Mihai
> > >
> > > ________________________________
> > > From: Hugh Pearse <hughpea...@gmail.com>
> > > Sent: Wednesday, June 25, 2025 11:30 AM
> > > To: dev@calcite.apache.org <dev@calcite.apache.org>
> > > Subject: SqlParser.parseExpression()
> > >
> > > Hi team,
> > > I am trying to join 2 tables, but want to provide the join condition
> > (join
> > > ON x=y) as a text input which is parsed.
> > >
> > > Example scenario:
> > >
> > > 1. user provides input:
> > >     emps.deptno = depts.deptno
> > > 2. calcite parses join expression and creates join
> > >
> > > Expected result
> > >     SELECT * FROM emps JOIN depts ON emps.deptno = depts.deptno
> > >
> > > Actual result:
> > > I am facing error:
> > > Table 'emps' not found
> > > at
> > >
> > >
> >
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > Method)
> > > at
> > >
> > >
> >
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
> > > at
> > >
> > >
> >
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > > at
> > >
> > >
> >
> java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
> > > at
> > >
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
> > > at
> > >
> >
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:507)
> > > at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:948)
> > > at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:933)
> > > at
> > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5643)
> > > at
> > >
> > >
> >
> org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:364)
> > > at
> > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateIdentifier(SqlValidatorImpl.java:3348)
> > > at
> > >
> org.apache.calcite.sql.SqlIdentifier.validateExpr(SqlIdentifier.java:307)
> > > at
> org.apache.calcite.sql.SqlOperator.validateCall(SqlOperator.java:475)
> > > at
> > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateCall(SqlValidatorImpl.java:6371)
> > > at org.apache.calcite.sql.SqlCall.validate(SqlCall.java:143)
> > > at
> > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1101)
> > > at
> > >
> > >
> >
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:807)
> > >
> > >
> > > Code:
> > >
> > > ```java
> > > import com.google.common.collect.ImmutableList;
> > > import org.apache.calcite.adapter.java.ReflectiveSchema;
> > > import org.apache.calcite.config.Lex;
> > > import org.apache.calcite.jdbc.CalciteConnection;
> > > import org.apache.calcite.jdbc.CalciteSchema;
> > > import org.apache.calcite.plan.RelOptCluster;
> > > import org.apache.calcite.plan.ViewExpanders;
> > > import org.apache.calcite.prepare.CalciteCatalogReader;
> > > import org.apache.calcite.rel.type.RelDataTypeFactory;
> > > import org.apache.calcite.rex.RexBuilder;
> > > import org.apache.calcite.rex.RexNode;
> > > import org.apache.calcite.schema.SchemaPlus;
> > > import org.apache.calcite.sql.SqlNode;
> > > import org.apache.calcite.sql.fun.SqlStdOperatorTable;
> > > import org.apache.calcite.sql.parser.SqlParser;
> > > import org.apache.calcite.sql.validate.SqlValidator;
> > > import org.apache.calcite.sql.validate.SqlValidatorUtil;
> > > import org.apache.calcite.sql2rel.SqlToRelConverter;
> > > import org.apache.calcite.tools.FrameworkConfig;
> > > import org.apache.calcite.tools.Frameworks;
> > > import org.apache.calcite.tools.RelBuilder;
> > > import org.junit.jupiter.api.Test;
> > >
> > > import java.sql.Connection;
> > > import java.sql.DriverManager;
> > > import java.util.Properties;
> > >
> > > public class JoinTest {
> > >
> > >     public static class HrSchema {
> > >         public final Employee[] emps = {
> > >                 new Employee(100, "Alice", 10),
> > >                 new Employee(200, "Bob", 20)
> > >         };
> > >         public final Department[] depts = {
> > >                 new Department(10, "Sales"),
> > >                 new Department(20, "Engineering")
> > >         };
> > >     }
> > >
> > >     public static class Employee {
> > >         public final int empid;
> > >         public final String name;
> > >         public final int deptno;
> > >
> > >         public Employee(int empid, String name, int deptno) {
> > >             this.empid = empid;
> > >             this.name = name;
> > >             this.deptno = deptno;
> > >         }
> > >     }
> > >
> > >     public static class Department {
> > >         public final int deptno;
> > >         public final String name;
> > >
> > >         public Department(int deptno, String name) {
> > >             this.deptno = deptno;
> > >             this.name = name;
> > >         }
> > >     }
> > >
> > >     @Test
> > >     public void testJoinToRexNode() throws Exception {
> > >         Properties props = new Properties();
> > >         Connection connection =
> > > DriverManager.getConnection("jdbc:calcite:", props);
> > >         CalciteConnection calciteConn =
> > > connection.unwrap(CalciteConnection.class);
> > >         SchemaPlus rootSchema = calciteConn.getRootSchema();
> > >         rootSchema.add("hr", new ReflectiveSchema(new HrSchema()));
> > >         calciteConn.setSchema("hr");
> > >
> > >         FrameworkConfig config = Frameworks.newConfigBuilder()
> > >                 .parserConfig(SqlParser.config().withLex(Lex.MYSQL))
> > >                 .defaultSchema(rootSchema.getSubSchema("hr"))
> > >                 .build();
> > >
> > >         String sql = "SELECT * FROM emps JOIN depts ON emps.deptno =
> > > depts.deptno";
> > >
> > >         SqlValidator.Config sqlValidatorConfig =
> > > config.getSqlValidatorConfig();
> > >         SqlParser.Config parserConfig = config.getParserConfig();
> > >
> > >         ImmutableList<String> defaultSchemaPath = ImmutableList.of();
> > >         RelBuilder relBuilder = RelBuilder.create(config);
> > >         RelOptCluster cluster = relBuilder.getCluster();
> > >         RexBuilder rexBuilder = relBuilder.getRexBuilder();
> > >         RelDataTypeFactory typeFactory = rexBuilder.getTypeFactory();
> > >         CalciteSchema calciteSchema =
> > > CalciteSchema.from(rootSchema.getSubSchema("hr"));
> > >         CalciteCatalogReader calciteCatalogReader = new
> > > CalciteCatalogReader(
> > >                 calciteSchema,
> > >                 defaultSchemaPath,
> > >                 typeFactory,
> > >                 calciteConn.config()
> > >         );
> > >
> > >         SqlValidator sqlValidator = SqlValidatorUtil.newValidator(
> > >                 SqlStdOperatorTable.instance(),
> > >                 calciteCatalogReader,
> > >                 typeFactory,
> > >                 sqlValidatorConfig
> > >         );
> > >
> > >         SqlToRelConverter sqlToRelConverter = new SqlToRelConverter(
> > >                 ViewExpanders.simpleContext(cluster),
> > >                 sqlValidator,
> > >                 calciteCatalogReader,
> > >                 cluster,
> > >                 config.getConvertletTable(),
> > >                 config.getSqlToRelConverterConfig()
> > >         );
> > >
> > >         // First test (works)
> > >         SqlParser firstParser = SqlParser.create(sql, parserConfig);
> > >         SqlNode firstSqlNode = firstParser.parseQuery();
> > >         SqlNode firstValidatedSqlNode =
> > > sqlValidator.validate(firstSqlNode);
> > >         RexNode firstRexNode =
> > > sqlToRelConverter.convertExpression(firstValidatedSqlNode);
> > >         System.out.println(firstRexNode);
> > >
> > >         // Second test (fails)
> > >         sql = "emps.deptno = depts.deptno";
> > >         SqlParser secondParser = SqlParser.create(sql, parserConfig);
> > >         SqlNode secondSqlNode = secondParser.parseExpression();
> > >         SqlNode secondValidatedSqlNode =
> > > sqlValidator.validate(secondSqlNode);
> > >         RexNode secondRexNode =
> > > sqlToRelConverter.convertExpression(secondValidatedSqlNode);
> > >     }
> > > }
> > > ```
> > >
> > > error is raised at this line:
> > > SqlNode secondValidatedSqlNode = sqlValidator.validate(secondSqlNode);
> > >
> > > my questions are,
> > > 1. is this type of expression even supported?
> > > 2. its a bug with calcite or a bug with my code?
> > >
> > > From,
> > > Hugh Pearse
> > >
> >
>

Reply via email to