The use case is for "complex metrics"/"abstract columns". Let's say we have a sketch column in Druid called "customer_id_sketch" which is a sketch of an input column "customer_id". The "customer_id" dimension doesn't exist, but we'd like calcite to know about an abstract column called "customer_id" so we add it to our schema.
The validator needs to know how this "customer_id" column can be used. Can it be used as a filter? Can it be used in an aggregate? Which aggregate? At the moment the validator will think that a query of the form: select col from table where customer_id > 123 is valid, but we (and by we I mean the DruidTable/DruidQuery) knows that this abstract column can not be used in this way. Essentially, the validator will produce false positives. In the adapter side, when we see a query using an abstract column in the correct way, we change the column name to reference the real Druid metric name. i.e select count(distinct customer_id) from table will produce a DruidQuery equivalent to: select count(distinct customer_id_sketch) from table. Zain.On Wednesday, June 28, 2017, 11:21:25 AM PDT, Julian Hyde <[email protected]> wrote: Imagine if your C++ compiler accepted any C++ program (even an invalid one), compiled the code to assembly language, then spat out errors if the assembly language was invalid. You would find it hard to tie the error message back to the source code (e.g. mis-typed variable name, or wrong type for a method argument). And the C++ compiler might crash before it even generates assembly language. Validating the RelNode tree would be analogous to this. There are many concepts in SQL that (like field names, and table aliases, and sub-queries) that don’t exist in relational algebra. That said, there are a few cases where the query is valid but the validator disallows it because the system cannot compute it. (One example of this a query on a stream that would have to wait forever to see all the data it needs. Another is the query on Druid that tries to access data below the lowest retained level of aggregation.) Can you describe the use cases where SQL validation is not working for you? Let’s see if we can extend SQL validation to cover them. Julian > On Jun 28, 2017, at 10:22 AM, Zain Humayun <[email protected]> > wrote: > > I'm adding this validation logic for CALCITE-1787. Originally I was looking > for a way to allow adapters to help out during the validation of the > SqlNodes. The validation logic would have had to be able to talk communicate > with a DruidTable (or an AbstractTable in general). > > Furthermore, my validation logic is checking for illegal uses of specific > kinds of columns, so it seemed like that would be easier to do on a RelNode > tree rather than the SqlNode tree. For example, the use of column "A" might > be illegal in a join's condition, but valid in an Aggregate's COUNT aggregate > call. Since the column names can get renamed and mixed around, I found that > looking at the RelNode tree was better for this use case. Although since I am > not very familiar with the core calcite code, you might know of a better > solution. > > With small modifications to the VolcanoPlanner, an adapter can register a >RelValidationRule, and then check the final tree is valid, and throw an >SqlValidatorException if it is not. One hurdle i've run into is that I have to >validate the RelNodes inside the DruidQuery, and the RelNodes outside the >DruidQuery, which can be done with the same code if abstracted correctly, but >ideally i'd like to run my rules before anything is pushed into DruidQuery, >and after all Bindable rules that can be applied, have been. I don't think the >volcano planner differentiates between the two kinds of rules, so I don't >think I can do that (but maybe you know something I don't). > > Zain. > On Wednesday, June 28, 2017, 9:41:15 AM PDT, JD Zheng <[email protected]> > wrote: > > What is the original purpose of RelBuilder? Is it for supporting building > languages other than SQL as front end? If it is, I would vote for adding > ability to do the validation at the relnode level instead of SQLNode. > > -JD > >> On Jun 27, 2017, at 9:06 PM, Julian Hyde <[email protected]> wrote: >> >> I'm curious why you want to validate relational expressions (RelNode). >> Most projects that use Calcite use SQL as the front end, and they >> validate the SQL parse tree (SqlNode). If the parse tree is valid then >> the relational algebra will also be valid. >> >> But anyway, I could see a use for RelValidationRule if, say, people >> were creating queries using RelBuilder, or they wanted extra checks on >> what their RelOptRules were doing. >> >> As for including these in an adapter. I would see an adapter as a >> bundle of classes implementing a variety of plugin-in interfaces, >> perhaps packaged into a JAR with a manifest containing the list of >> classes that implement each interface (maybe something similar to >> META-INF/services/java.sql.Driver; see ServiceLoader [1]). We could >> add new plugin types as we go. >> >> Julian >> >> [1] https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html >> >> On Mon, Jun 26, 2017 at 5:49 PM, Zain Humayun >> <[email protected]> wrote: >>> >>> Thanks for the info about the overridable schema methods. I'll take a look >>> at those and see if they're a fit for what i'm trying to achieve. You >>> mentioned that we should allow adapters to include more kinds of plugins, >>> and I definitely agree. I've created something called a "RelValidationRule" >>> that lets adapters validate relational expressions (as opposed to >>> transforming them), so that adapters have a chance to participate in >>> validation. >>> >>> I was wondering if there was already a "plugin" similar to this in calcite >>> (that is not already exposed to adapters). If not, do you think adding this >>> new kind of rule is the proper way of letting adapters partake in the >>> validation? >>> >>> Zain. >>> On Monday, June 26, 2017, 5:16:29 PM PDT, Julian Hyde <[email protected]> >>> wrote: >>> >>> It sounds as if you need to define new SqlOperator instances and get them >>> into the SqlOperatorTable used by the validator. If you override the >>> Schema.getFunctionNames() and Schema.getFunctions(String name) methods you >>> should be able to declare additional user-defined functions (regular >>> functions, table functions, and even table macros, which are a >>> generalization of views). >>> >>> I don’t recall whether these have to be user-defined functions based on a >>> simple Java class, or whether we allow SqlOperator with more complex rules >>> for validation and code-generation. >>> >>> That said, there are many kinds of plug-ins that you can implement to >>> change the behavior of the validator (and other parts of the query >>> preparation process such as planning). But only a few of them (mainly >>> schema, table, and planning rules) are currently part of the loose bundling >>> concept we call an “adapter”. We should allow adapters to include more >>> kinds of plug-ins. >>> >>> Julian >>> >>> >>> >>> >>>> On Jun 20, 2017, at 2:02 PM, Zain Humayun <[email protected]> >>>> wrote: >>>> >>>> Hi, >>>> >>>> I'm looking into adding additional functionality to the Druid adapter, and >>>> will need to somehow change the behaviour of calcite's validation logic >>>> (either at the adapter level, or the core level). I was wondering if there >>>> was an existing way for an adapter to participate in SQL validation >>>> (specifically function type checking). As far as I can tell, the only part >>>> of an adapter that calcite is aware of during validation are the table >>>> factory classes, which end up returning a table that are stored inside of >>>> a schema. >>>> >>>> If such functionality does not currently exist, what would be the best way >>>> to approach this? Essentially i'm trying to do type checking on aggregate >>>> functions with the help of information stored inside >>>> DruidQuery/DruidTable. At the moment I'm thinking the best way might be to >>>> add another JDBC string parameter specifying a class that can assist >>>> calcite in validation. >>>> >>>> Another approach might be to push the logic into the adapter entirely (as >>>> a rule for example) and then throw an exception when something fails >>>> validation. This approach is more hacky, but it might also be feasible. >>>> >>>> If anyone has experience in implementing something similar to this and/or >>>> calcite's validation code in general i'd love to hear your thoughts. >>>> >>>> Thanks, >>>> Zain.
