Re: Embedding Calcite, adjusting convertlets

Julian Hyde Fri, 13 Jan 2017 09:21:44 -0800

Ah, thanks for letting me know. RelRoot is tricky; because it is not a
RelNode, you are working outside the usual mechanism for sorts and
projects, hence it is error-prone, as you have discovered.


On Fri, Jan 13, 2017 at 9:02 AM, Gian Merlino <[email protected]> wrote:
> This was totally user error, I was ignoring "fields" and "collation" of the
> RelRoot. Passing "collation" to planner.transform stopped it from removing
> the sort.
>
> Gian
>
> On Wed, Dec 14, 2016 at 3:21 PM, Gian Merlino <[email protected]> wrote:
>
>> I don't think so, since my test query (SELECT dim1 FROM s.foo GROUP BY
>> dim1 ORDER BY dim1 DESC) is sorting on a column that is also projected.
>>
>> Gian
>>
>> On Wed, Dec 14, 2016 at 11:58 AM, Julian Hyde <[email protected]> wrote:
>>
>>> Are you running into some variant of the problems that inspired
>>> https://issues.apache.org/jira/browse/CALCITE-819: <
>>> https://issues.apache.org/jira/browse/CALCITE-819:> at the root of the
>>> tree, columns that are not projected are removed, and if the desired sort
>>> order involves non-projected columns, the desired sort order is forgotten).
>>>
>>> > On Dec 14, 2016, at 11:19 AM, Gian Merlino <[email protected]> wrote:
>>> >
>>> > Ah, thanks. So if that sort of thing is not a smoking gun, do you have
>>> an
>>> > idea about where I should look next? If not I'll keep poking around.
>>> >
>>> > Gian
>>> >
>>> > On Wed, Dec 14, 2016 at 11:06 AM, Julian Hyde <[email protected]> wrote:
>>> >
>>> >>> - But its "set" field points to a RelSet with "rels" that _don't_ have
>>> >>> _any_ collation traits.
>>> >>
>>> >> That’s OK. A “subset” (RelSubset) is a collection of RelNodes that are
>>> >> logically and physically equivalent (same results, same physical
>>> >> properties) whereas a “set” (RelSet) is a collection of RelNodes that
>>> are
>>> >> logically equivalent.
>>> >>
>>> >> A set can therefore be considered to be a collection of subsets, each
>>> of
>>> >> which contains RelNodes. And it used to be implemented that way, but in
>>> >> https://issues.apache.org/jira/browse/CALCITE-88 <
>>> >> https://issues.apache.org/jira/browse/CALCITE-88> we introduced
>>> collation
>>> >> as a trait, and that made subsets non-disjoint (a RelNode can be
>>> sorted on
>>> >> (x, y), and also on (x), and also on (), and also on (z)) so we made
>>> >> RelSubset just a view onto a RelSet, filtering the list of RelNodes
>>> >> according to the ones that have (“subsume”) the desired traits.
>>> >>
>>> >> Julian
>>> >>
>>> >>
>>> >>> On Dec 14, 2016, at 10:45 AM, Gian Merlino <[email protected]> wrote:
>>> >>>
>>> >>> I spent some more time looking into (3) and found that when I had
>>> things
>>> >>> going through the Planner rather than the JDBC driver, SortRemoveRule
>>> was
>>> >>> removing sorts when it shouldn't have been. This happens even for
>>> simple
>>> >>> queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
>>> >>> DESC". Removing SortRemoveRule from the planner fixed the broken
>>> tests on
>>> >>> my end.
>>> >>>
>>> >>> I dug into that a bit and saw that the call to
>>> "convert(sort.getInput(),
>>> >>> traits)" in SortRemoveRule was returning a RelSubset that looked a bit
>>> >>> funny in the debugger:
>>> >>>
>>> >>> - The RelSubset's "traitSet" _does_ have the proper collation trait.
>>> >>> - But its "set" field points to a RelSet with "rels" that _don't_ have
>>> >>> _any_ collation traits.
>>> >>>
>>> >>> From what I understand that causes Calcite to treat the unsorted and
>>> >> sorted
>>> >>> rels as equivalent when they in fact aren't. I'm still not sure if
>>> this
>>> >> is
>>> >>> a Calcite bug or user error on my part… I'll keep looking into it
>>> unless
>>> >>> someone has any bright ideas.
>>> >>>
>>> >>> fwiw, my Planner construction looks like this:
>>> >>>
>>> >>>   final FrameworkConfig frameworkConfig = Frameworks
>>> >>>       .newConfigBuilder()
>>> >>>       .parserConfig(
>>> >>>           SqlParser.configBuilder()
>>> >>>                    .setCaseSensitive(true)
>>> >>>                    .setUnquotedCasing(Casing.UNCHANGED)
>>> >>>                    .build()
>>> >>>       )
>>> >>>       .defaultSchema(rootSchema)
>>> >>>       .traitDefs(ConventionTraitDef.INSTANCE,
>>> >>> RelCollationTraitDef.INSTANCE)
>>> >>>       .programs(Programs.ofRules(myRules))
>>> >>>       .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
>>> >>>       .context(Contexts.EMPTY_CONTEXT)
>>> >>>       .build();
>>> >>>
>>> >>>   return Frameworks.getPlanner(frameworkConfig);
>>> >>>
>>> >>> Gian
>>> >>>
>>> >>> On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <[email protected]> wrote:
>>> >>>
>>> >>>> Sure, I added those first two to the ticket.
>>> >>>>
>>> >>>> I don't think those are happening with (3) but I'll double check next
>>> >> time
>>> >>>> I take a look at using the Planner.
>>> >>>>
>>> >>>> Gian
>>> >>>>
>>> >>>> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>>> Can you please add (1) and (2) to https://issues.apache.org/jira
>>> >>>>> /browse/CALCITE-1525 <https://issues.apache.org/jir
>>> >> a/browse/CALCITE-1525>,
>>> >>>>> which deals with the whole issue of using “Planner” within the JDBC
>>> >> driver,
>>> >>>>> so we can be consistent.
>>> >>>>>
>>> >>>>> (3) doesn’t look likely to be related. Do your queries have UNION or
>>> >>>>> other set-ops? Are you sorting on columns that do not appear in the
>>> >> final
>>> >>>>> result?
>>> >>>>>
>>> >>>>> Julian
>>> >>>>>
>>> >>>>>
>>> >>>>>> On Nov 28, 2016, at 10:45 AM, Gian Merlino <[email protected]> wrote:
>>> >>>>>>
>>> >>>>>> I traveled a bit down the Frameworks/Planner road and got most of
>>> my
>>> >>>>> tests
>>> >>>>>> passing, but ran into some problems getting them all to work:
>>> >>>>>>
>>> >>>>>> (1) "EXPLAIN PLAN FOR" throws NullPointerException during
>>> >>>>> Planner.validate.
>>> >>>>>> It looks like CalcitePrepareImpl has some special code to handle
>>> >>>>> validation
>>> >>>>>> of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is
>>> >> something I
>>> >>>>>> should be doing on my end, or if it's a bug in PlannerImpl.
>>> >>>>>> (2) I don't see a way to do ?-style prepared statements with bound
>>> >>>>>> variables, which _is_ possible with the JDBC driver route.
>>> >>>>>> (3) Not sure why this is happening, but for some reason ORDER BY /
>>> >> LIMIT
>>> >>>>>> clauses are getting ignored sometimes, even when they work with the
>>> >> JDBC
>>> >>>>>> driver route. This may be something messed up with my rules though
>>> and
>>> >>>>> may
>>> >>>>>> not be Calcite's fault.
>>> >>>>>>
>>> >>>>>> Julian, do any of these look like bugs that should be raised in
>>> jira,
>>> >> or
>>> >>>>>> are they just stuff I should be dealing with on my side?
>>> >>>>>>
>>> >>>>>> Btw, I do like that the Frameworks/Planner route gives me back the
>>> >>>>> RelNode
>>> >>>>>> itself, since that means I can make the Druid queries directly
>>> without
>>> >>>>>> needing to go through the extra layers of the JDBC driver. That
>>> part
>>> >> is
>>> >>>>>> nice.
>>> >>>>>>
>>> >>>>>> Gian
>>> >>>>>>
>>> >>>>>> On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <[email protected]>
>>> >> wrote:
>>> >>>>>>
>>> >>>>>>> I don’t know how it’s used outside Calcite. Maybe some others can
>>> >>>>> chime in.
>>> >>>>>>>
>>> >>>>>>> Thanks for the PR. I logged https://issues.apache.org/jira
>>> >>>>>>> /browse/CALCITE-1509 <https://issues.apache.org/jir
>>> >>>>> a/browse/CALCITE-1509>
>>> >>>>>>> for it, and will commit shortly.
>>> >>>>>>>
>>> >>>>>>> Julian
>>> >>>>>>>
>>> >>>>>>>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <[email protected]>
>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Do you know examples of projects that use Planner or PlannerImpl
>>> >>>>>>> currently
>>> >>>>>>>> (from "outside")? As far as I can tell, within Calcite itself
>>> it's
>>> >>>>> only
>>> >>>>>>>> used in test code. Maybe that'd be a better entry point.
>>> >>>>>>>>
>>> >>>>>>>> In the meantime I raised a PR here for allowing a convertlet
>>> table
>>> >>>>>>> override
>>> >>>>>>>> in a CalcitePrepareImpl: https://github.com/apache/calc
>>> ite/pull/330
>>> >> .
>>> >>>>>>> That
>>> >>>>>>>> was enough to get the JDBC driver on my end to behave how I want
>>> it
>>> >>>>> to.
>>> >>>>>>>>
>>> >>>>>>>> Gian
>>> >>>>>>>>
>>> >>>>>>>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <[email protected]>
>>> >>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> I was wrong earlier… FrameworkConfig already has a
>>> >> getConvertletTable
>>> >>>>>>>>> method. But regarding using FrameworkConfig from within the JDBC
>>> >>>>> driver,
>>> >>>>>>>>> It’s complicated. FrameworkConfig only works if you are
>>> “outside”
>>> >>>>>>> Calcite,
>>> >>>>>>>>> whereas CalcitePrepare is when you are customizing from the
>>> inside,
>>> >>>>> and
>>> >>>>>>>>> sadly CalcitePrepare does not use a FrameworkConfig.
>>> >>>>>>>>>
>>> >>>>>>>>> Compare and contrast:
>>> >>>>>>>>> * CalcitePrepareImpl.getSqlToRelConverter [
>>> >>>>> https://github.com/apache/
>>> >>>>>>>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
>>> >>>>>>> src/main/java/
>>> >>>>>>>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
>>> >>>>>>>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
>>> >>>>> 8d22d7a75
>>> >>>>>>>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
>>> >>>>>>>>> CalcitePrepareImpl.java#L1114> ]
>>> >>>>>>>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
>>> >>>>>>>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
>>> >>>>>>>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
>>> >>>>>>>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
>>> >>>>> 11d262e48
>>> >>>>>>>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
>>> >>>>>>>>> PlannerImpl.java#L225> ]
>>> >>>>>>>>>
>>> >>>>>>>>> The latter uses a convertletTable sourced from a
>>> FrameworkConfig.
>>> >>>>>>>>>
>>> >>>>>>>>> The ideal thing would be to get CalcitePrepareImpl to use a
>>> >>>>> PlannerImpl
>>> >>>>>>> to
>>> >>>>>>>>> do its dirty work. Then “inside” and “outside” would work the
>>> same.
>>> >>>>>>> Would
>>> >>>>>>>>> definitely appreciate that as a patch.
>>> >>>>>>>>>
>>> >>>>>>>>> If you choose to go the JDBC driver route, you could override
>>> >>>>>>>>> Driver.createPrepareFactory to produce a sub-class of
>>> >> CalcitePrepare
>>> >>>>>>> that
>>> >>>>>>>>> works for your environment, one with an explicit convertletTable
>>> >>>>> rather
>>> >>>>>>>>> than just using the default.
>>> >>>>>>>>>
>>> >>>>>>>>> Julian
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <[email protected]>
>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Hey Julian,
>>> >>>>>>>>>>
>>> >>>>>>>>>> If the convertlets were customizable with a FrameworkConfig,
>>> how
>>> >>>>> would
>>> >>>>>>> I
>>> >>>>>>>>>> use that configure the JDBC driver (given that I'm doing it
>>> with
>>> >> the
>>> >>>>>>> code
>>> >>>>>>>>>> upthread)? Or would that suggest using a different approach to
>>> >>>>>>> embedding
>>> >>>>>>>>>> Calcite?
>>> >>>>>>>>>>
>>> >>>>>>>>>> Gian
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <[email protected]
>>> >
>>> >>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Convertlets have a similar effect to planner rules (albeit
>>> they
>>> >>>>> act on
>>> >>>>>>>>>>> scalar expressions, not relational expressions) so people
>>> should
>>> >> be
>>> >>>>>>>>> able to
>>> >>>>>>>>>>> change the set of active convertlets.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Would you like to propose a change that makes the convertlet
>>> >> table
>>> >>>>>>>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless,
>>> please
>>> >>>>> log a
>>> >>>>>>>>> JIRA
>>> >>>>>>>>>>> to track this.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> And by the way, RexImpTable, which defines how operators are
>>> >>>>>>> implemented
>>> >>>>>>>>>>> by generating java code, should also be pluggable. It’s been
>>> on
>>> >> my
>>> >>>>>>> mind
>>> >>>>>>>>> for
>>> >>>>>>>>>>> a long time to allow the “engine” — related to the data
>>> format,
>>> >> and
>>> >>>>>>> how
>>> >>>>>>>>>>> code is generated to access fields and evaluate expressions
>>> and
>>> >>>>>>>>> operators —
>>> >>>>>>>>>>> to be pluggable.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Regarding whether the JDBC driver is the right way to embed
>>> >>>>> Calcite.
>>> >>>>>>>>>>> There’s no easy answer. You might want to embed Calcite as a
>>> >>>>> library
>>> >>>>>>> in
>>> >>>>>>>>>>> your own server (as Drill and Hive do). Or you might want to
>>> make
>>> >>>>>>>>> yourself
>>> >>>>>>>>>>> just an adapter that runs inside a Calcite JDBC server (as the
>>> >> CSV
>>> >>>>>>>>> adapter
>>> >>>>>>>>>>> does). Or something in the middle, like what Phoenix does:
>>> using
>>> >>>>>>> Calcite
>>> >>>>>>>>>>> for JDBC, SQL, planning, but with your own metadata and
>>> runtime
>>> >>>>>>> engine.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> As long as you build the valuable stuff into planner rules,
>>> new
>>> >>>>>>>>> relational
>>> >>>>>>>>>>> operators (if necessary) and use the schema SPI, you should be
>>> >>>>> able to
>>> >>>>>>>>>>> change packaging in the future.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Julian
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <[email protected]>
>>> >> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Hey Calcites,
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I'm working on embedding Calcite into Druid (
>>> http://druid.io/,
>>> >>>>>>>>>>>> https://github.com/druid-io/druid/pull/3682) and am running
>>> >> into
>>> >>>>> a
>>> >>>>>>>>>>> problem
>>> >>>>>>>>>>>> that is making me wonder if the approach I'm using makes
>>> sense.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite
>>> has a
>>> >>>>>>>>> standard
>>> >>>>>>>>>>>> convertlet rule "convertExtract" that changes this into some
>>> >>>>>>> arithmetic
>>> >>>>>>>>>>> on
>>> >>>>>>>>>>>> __time casted to an int type. But Druid has some builtin
>>> >>>>> functions to
>>> >>>>>>>>> do
>>> >>>>>>>>>>>> this, and I'd rather use those than arithmetic (for a bunch
>>> of
>>> >>>>>>>>> reasons).
>>> >>>>>>>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
>>> >>>>>>> queries,
>>> >>>>>>>>>>> I'd
>>> >>>>>>>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
>>> >>>>>>>>> expression
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>>> apply it to. That's a lot easier to translate than the
>>> >> arithmetic
>>> >>>>>>>>> stuff,
>>> >>>>>>>>>>>> which I'd have to pattern match and undo first before
>>> >> translating.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> So the problem I have is that I want to disable
>>> convertExtract,
>>> >>>>> but I
>>> >>>>>>>>>>> don't
>>> >>>>>>>>>>>> see a way to do that or to swap out the convertlet table.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> The code I'm using to set up a connection is:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> public CalciteConnection createCalciteConnection(
>>> >>>>>>>>>>>> final DruidSchema druidSchema
>>> >>>>>>>>>>>> ) throws SQLException
>>> >>>>>>>>>>>> {
>>> >>>>>>>>>>>> final Properties props = new Properties();
>>> >>>>>>>>>>>> props.setProperty("caseSensitive", "true");
>>> >>>>>>>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
>>> >>>>>>>>>>>> final Connection connection =
>>> >>>>>>>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
>>> >>>>>>>>>>>> final CalciteConnection calciteConnection =
>>> >>>>>>>>>>>> connection.unwrap(CalciteConnection.class);
>>> >>>>>>>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
>>> >>>>>>>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
>>> >>>>>>>>>>> druidSchema);
>>> >>>>>>>>>>>> return calciteConnection;
>>> >>>>>>>>>>>> }
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> This CalciteConnection is then used by the Druid HTTP server
>>> to
>>> >>>>>>> offer a
>>> >>>>>>>>>>> SQL
>>> >>>>>>>>>>>> API.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Is there some way to swap out the convertlet table that I'm
>>> >>>>> missing?
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Also, just in general, am I going about this the right way?
>>> Is
>>> >>>>> using
>>> >>>>>>>>> the
>>> >>>>>>>>>>>> JDBC driver the right way to embed Calcite? Or should I be
>>> >> calling
>>> >>>>>>> into
>>> >>>>>>>>>>> it
>>> >>>>>>>>>>>> at some lower level?
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Thanks!
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Gian
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>

Re: Embedding Calcite, adjusting convertlets

Reply via email to