Re: Extracting field identifiers (SqlScopedShuttle / SqlValidator)

Julian Hyde Thu, 11 Nov 2021 16:06:38 -0800

You shouldn't need to solve this without dealing with
SqlScopedShuttle. If the validator does its job (and yes, it uses
SqlScopedShuttle to do its job) then all identifiers will be resolved
to the objects they reference.


If I were you I would start by running
SqlValidatorTest.testFieldOrigin in a debugger. You will see that
after validation has succeeded it calls validator.getFieldOrigins().

Once you have that working, you can consider variations on a theme
(e.g. using Babel parser rather than the Core parser, or getting
different kinds of lineage information). Maybe you'll run into bugs,
and if so, you can add those as test cases in SqlValidatorTest or
Babel test, and then maybe someone (maybe you) will end up fixing
SqlScopedShuttle, but there are several maybes before we get there.

Julian



On Thu, Nov 11, 2021 at 10:42 AM Mark Grey
<mgthesec...@spotify.com.invalid> wrote:
>
> In my efforts to continue learning about the internals of Calcite, I've
> been experimenting with applying Calcite's Babel parser to the task of
> extracting fully qualified field references from a query.  The use case
> here would be to enumerate all the fields of a schema that are "touched" by
> a given query, for the purpose of informing table owners about the usage by
> consumers of particular fields.
>
> It seems there has been some prior interest in this use case, but the few
> of the stackoverflow posts I could find on the topic are likely outdated.
> One post that has proved helpful was this one from Julian
> <https://stackoverflow.com/a/37554628/215608>, which explains the roles of
> namespaces and scopes in the validation process.
>
> The comments in that thread mention SqlScopedShuttle only maintaining a
> stack of SqlNode, but it seems it has changed to keep a stack of
> SqlValidatorScopes?  For the most part, I seem to be able to use that Deque
> to match identifiers to their tables, following on an example drawn from
> the getFieldOrigins method of SqlValidatorImpl
> <https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/validate/SqlValidator.html#getFieldOrigins(org.apache.calcite.sql.SqlNode)>.
> My shuttle just uses the visit method for SqlIdentifier to try to qualify
> the identifier against the namespace associated with the current scope,
> discarding all that fail to be found.
>
> Here's my really naaive approach with some test cases expressed as yaml
> <https://gist.github.com/DeaconDesperado/416f11bf91d0ca60b3c30b22626f4178>.
> Obviously there's some cases that fail badly with this approach, one of
> which I've included wherein an UNNEST is used on an array field containing
> subrecords.
>
> A few questions I have:
>
>    - Does this approach make sense for this use case, or would it make more
>    sense to use an implementation of SqlValidator, which seems to maintain
>    more state as to the scopes + namespaces?
>    - Would it be easier to approach this particular use case from
>    navigating the generated relational algebra rather than the SQL AST?
>    - Does the static factory method for the shuttle make sense
>    idiomatically with the intended usage for SqlScopedShuttle?  How typically
>    should one obtain the intialScope?  I've tried to enumerate both the top
>    level SqlSelect as well as the potential for a list of CTEs using WITH
>
> Many thanks!

Re: Extracting field identifiers (SqlScopedShuttle / SqlValidator)

Reply via email to