Re: Embedding Calcite, adjusting convertlets

Julian Hyde Wed, 14 Dec 2016 11:59:00 -0800
Are you running into some variant of the problems that inspired 
https://issues.apache.org/jira/browse/CALCITE-819: 
<https://issues.apache.org/jira/browse/CALCITE-819:> at the root of the tree, 
columns that are not projected are removed, and if the desired sort order 
involves non-projected columns, the desired sort order is forgotten).
 
> On Dec 14, 2016, at 11:19 AM, Gian Merlino <[email protected]> wrote:
> 
> Ah, thanks. So if that sort of thing is not a smoking gun, do you have an
> idea about where I should look next? If not I'll keep poking around.
> 
> Gian
> 
> On Wed, Dec 14, 2016 at 11:06 AM, Julian Hyde <[email protected]> wrote:
> 
>>> - But its "set" field points to a RelSet with "rels" that _don't_ have
>>> _any_ collation traits.
>> 
>> That’s OK. A “subset” (RelSubset) is a collection of RelNodes that are
>> logically and physically equivalent (same results, same physical
>> properties) whereas a “set” (RelSet) is a collection of RelNodes that are
>> logically equivalent.
>> 
>> A set can therefore be considered to be a collection of subsets, each of
>> which contains RelNodes. And it used to be implemented that way, but in
>> https://issues.apache.org/jira/browse/CALCITE-88 <
>> https://issues.apache.org/jira/browse/CALCITE-88> we introduced collation
>> as a trait, and that made subsets non-disjoint (a RelNode can be sorted on
>> (x, y), and also on (x), and also on (), and also on (z)) so we made
>> RelSubset just a view onto a RelSet, filtering the list of RelNodes
>> according to the ones that have (“subsume”) the desired traits.
>> 
>> Julian
>> 
>> 
>>> On Dec 14, 2016, at 10:45 AM, Gian Merlino <[email protected]> wrote:
>>> 
>>> I spent some more time looking into (3) and found that when I had things
>>> going through the Planner rather than the JDBC driver, SortRemoveRule was
>>> removing sorts when it shouldn't have been. This happens even for simple
>>> queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
>>> DESC". Removing SortRemoveRule from the planner fixed the broken tests on
>>> my end.
>>> 
>>> I dug into that a bit and saw that the call to "convert(sort.getInput(),
>>> traits)" in SortRemoveRule was returning a RelSubset that looked a bit
>>> funny in the debugger:
>>> 
>>> - The RelSubset's "traitSet" _does_ have the proper collation trait.
>>> - But its "set" field points to a RelSet with "rels" that _don't_ have
>>> _any_ collation traits.
>>> 
>>> From what I understand that causes Calcite to treat the unsorted and
>> sorted
>>> rels as equivalent when they in fact aren't. I'm still not sure if this
>> is
>>> a Calcite bug or user error on my part… I'll keep looking into it unless
>>> someone has any bright ideas.
>>> 
>>> fwiw, my Planner construction looks like this:
>>> 
>>>   final FrameworkConfig frameworkConfig = Frameworks
>>>       .newConfigBuilder()
>>>       .parserConfig(
>>>           SqlParser.configBuilder()
>>>                    .setCaseSensitive(true)
>>>                    .setUnquotedCasing(Casing.UNCHANGED)
>>>                    .build()
>>>       )
>>>       .defaultSchema(rootSchema)
>>>       .traitDefs(ConventionTraitDef.INSTANCE,
>>> RelCollationTraitDef.INSTANCE)
>>>       .programs(Programs.ofRules(myRules))
>>>       .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
>>>       .context(Contexts.EMPTY_CONTEXT)
>>>       .build();
>>> 
>>>   return Frameworks.getPlanner(frameworkConfig);
>>> 
>>> Gian
>>> 
>>> On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <[email protected]> wrote:
>>> 
>>>> Sure, I added those first two to the ticket.
>>>> 
>>>> I don't think those are happening with (3) but I'll double check next
>> time
>>>> I take a look at using the Planner.
>>>> 
>>>> Gian
>>>> 
>>>> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <[email protected]> wrote:
>>>> 
>>>>> Can you please add (1) and (2) to https://issues.apache.org/jira
>>>>> /browse/CALCITE-1525 <https://issues.apache.org/jir
>> a/browse/CALCITE-1525>,
>>>>> which deals with the whole issue of using “Planner” within the JDBC
>> driver,
>>>>> so we can be consistent.
>>>>> 
>>>>> (3) doesn’t look likely to be related. Do your queries have UNION or
>>>>> other set-ops? Are you sorting on columns that do not appear in the
>> final
>>>>> result?
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>>> On Nov 28, 2016, at 10:45 AM, Gian Merlino <[email protected]> wrote:
>>>>>> 
>>>>>> I traveled a bit down the Frameworks/Planner road and got most of my
>>>>> tests
>>>>>> passing, but ran into some problems getting them all to work:
>>>>>> 
>>>>>> (1) "EXPLAIN PLAN FOR" throws NullPointerException during
>>>>> Planner.validate.
>>>>>> It looks like CalcitePrepareImpl has some special code to handle
>>>>> validation
>>>>>> of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is
>> something I
>>>>>> should be doing on my end, or if it's a bug in PlannerImpl.
>>>>>> (2) I don't see a way to do ?-style prepared statements with bound
>>>>>> variables, which _is_ possible with the JDBC driver route.
>>>>>> (3) Not sure why this is happening, but for some reason ORDER BY /
>> LIMIT
>>>>>> clauses are getting ignored sometimes, even when they work with the
>> JDBC
>>>>>> driver route. This may be something messed up with my rules though and
>>>>> may
>>>>>> not be Calcite's fault.
>>>>>> 
>>>>>> Julian, do any of these look like bugs that should be raised in jira,
>> or
>>>>>> are they just stuff I should be dealing with on my side?
>>>>>> 
>>>>>> Btw, I do like that the Frameworks/Planner route gives me back the
>>>>> RelNode
>>>>>> itself, since that means I can make the Druid queries directly without
>>>>>> needing to go through the extra layers of the JDBC driver. That part
>> is
>>>>>> nice.
>>>>>> 
>>>>>> Gian
>>>>>> 
>>>>>> On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <[email protected]>
>> wrote:
>>>>>> 
>>>>>>> I don’t know how it’s used outside Calcite. Maybe some others can
>>>>> chime in.
>>>>>>> 
>>>>>>> Thanks for the PR. I logged https://issues.apache.org/jira
>>>>>>> /browse/CALCITE-1509 <https://issues.apache.org/jir
>>>>> a/browse/CALCITE-1509>
>>>>>>> for it, and will commit shortly.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Do you know examples of projects that use Planner or PlannerImpl
>>>>>>> currently
>>>>>>>> (from "outside")? As far as I can tell, within Calcite itself it's
>>>>> only
>>>>>>>> used in test code. Maybe that'd be a better entry point.
>>>>>>>> 
>>>>>>>> In the meantime I raised a PR here for allowing a convertlet table
>>>>>>> override
>>>>>>>> in a CalcitePrepareImpl: https://github.com/apache/calcite/pull/330
>> .
>>>>>>> That
>>>>>>>> was enough to get the JDBC driver on my end to behave how I want it
>>>>> to.
>>>>>>>> 
>>>>>>>> Gian
>>>>>>>> 
>>>>>>>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I was wrong earlier… FrameworkConfig already has a
>> getConvertletTable
>>>>>>>>> method. But regarding using FrameworkConfig from within the JDBC
>>>>> driver,
>>>>>>>>> It’s complicated. FrameworkConfig only works if you are “outside”
>>>>>>> Calcite,
>>>>>>>>> whereas CalcitePrepare is when you are customizing from the inside,
>>>>> and
>>>>>>>>> sadly CalcitePrepare does not use a FrameworkConfig.
>>>>>>>>> 
>>>>>>>>> Compare and contrast:
>>>>>>>>> * CalcitePrepareImpl.getSqlToRelConverter [
>>>>> https://github.com/apache/
>>>>>>>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
>>>>>>> src/main/java/
>>>>>>>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
>>>>>>>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
>>>>> 8d22d7a75
>>>>>>>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
>>>>>>>>> CalcitePrepareImpl.java#L1114> ]
>>>>>>>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
>>>>>>>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
>>>>>>>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
>>>>>>>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
>>>>> 11d262e48
>>>>>>>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
>>>>>>>>> PlannerImpl.java#L225> ]
>>>>>>>>> 
>>>>>>>>> The latter uses a convertletTable sourced from a FrameworkConfig.
>>>>>>>>> 
>>>>>>>>> The ideal thing would be to get CalcitePrepareImpl to use a
>>>>> PlannerImpl
>>>>>>> to
>>>>>>>>> do its dirty work. Then “inside” and “outside” would work the same.
>>>>>>> Would
>>>>>>>>> definitely appreciate that as a patch.
>>>>>>>>> 
>>>>>>>>> If you choose to go the JDBC driver route, you could override
>>>>>>>>> Driver.createPrepareFactory to produce a sub-class of
>> CalcitePrepare
>>>>>>> that
>>>>>>>>> works for your environment, one with an explicit convertletTable
>>>>> rather
>>>>>>>>> than just using the default.
>>>>>>>>> 
>>>>>>>>> Julian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hey Julian,
>>>>>>>>>> 
>>>>>>>>>> If the convertlets were customizable with a FrameworkConfig, how
>>>>> would
>>>>>>> I
>>>>>>>>>> use that configure the JDBC driver (given that I'm doing it with
>> the
>>>>>>> code
>>>>>>>>>> upthread)? Or would that suggest using a different approach to
>>>>>>> embedding
>>>>>>>>>> Calcite?
>>>>>>>>>> 
>>>>>>>>>> Gian
>>>>>>>>>> 
>>>>>>>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <[email protected]>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Convertlets have a similar effect to planner rules (albeit they
>>>>> act on
>>>>>>>>>>> scalar expressions, not relational expressions) so people should
>> be
>>>>>>>>> able to
>>>>>>>>>>> change the set of active convertlets.
>>>>>>>>>>> 
>>>>>>>>>>> Would you like to propose a change that makes the convertlet
>> table
>>>>>>>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless, please
>>>>> log a
>>>>>>>>> JIRA
>>>>>>>>>>> to track this.
>>>>>>>>>>> 
>>>>>>>>>>> And by the way, RexImpTable, which defines how operators are
>>>>>>> implemented
>>>>>>>>>>> by generating java code, should also be pluggable. It’s been on
>> my
>>>>>>> mind
>>>>>>>>> for
>>>>>>>>>>> a long time to allow the “engine” — related to the data format,
>> and
>>>>>>> how
>>>>>>>>>>> code is generated to access fields and evaluate expressions and
>>>>>>>>> operators —
>>>>>>>>>>> to be pluggable.
>>>>>>>>>>> 
>>>>>>>>>>> Regarding whether the JDBC driver is the right way to embed
>>>>> Calcite.
>>>>>>>>>>> There’s no easy answer. You might want to embed Calcite as a
>>>>> library
>>>>>>> in
>>>>>>>>>>> your own server (as Drill and Hive do). Or you might want to make
>>>>>>>>> yourself
>>>>>>>>>>> just an adapter that runs inside a Calcite JDBC server (as the
>> CSV
>>>>>>>>> adapter
>>>>>>>>>>> does). Or something in the middle, like what Phoenix does: using
>>>>>>> Calcite
>>>>>>>>>>> for JDBC, SQL, planning, but with your own metadata and runtime
>>>>>>> engine.
>>>>>>>>>>> 
>>>>>>>>>>> As long as you build the valuable stuff into planner rules, new
>>>>>>>>> relational
>>>>>>>>>>> operators (if necessary) and use the schema SPI, you should be
>>>>> able to
>>>>>>>>>>> change packaging in the future.
>>>>>>>>>>> 
>>>>>>>>>>> Julian
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <[email protected]>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hey Calcites,
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm working on embedding Calcite into Druid (http://druid.io/,
>>>>>>>>>>>> https://github.com/druid-io/druid/pull/3682) and am running
>> into
>>>>> a
>>>>>>>>>>> problem
>>>>>>>>>>>> that is making me wonder if the approach I'm using makes sense.
>>>>>>>>>>>> 
>>>>>>>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite has a
>>>>>>>>> standard
>>>>>>>>>>>> convertlet rule "convertExtract" that changes this into some
>>>>>>> arithmetic
>>>>>>>>>>> on
>>>>>>>>>>>> __time casted to an int type. But Druid has some builtin
>>>>> functions to
>>>>>>>>> do
>>>>>>>>>>>> this, and I'd rather use those than arithmetic (for a bunch of
>>>>>>>>> reasons).
>>>>>>>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
>>>>>>> queries,
>>>>>>>>>>> I'd
>>>>>>>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
>>>>>>>>> expression
>>>>>>>>>>> to
>>>>>>>>>>>> apply it to. That's a lot easier to translate than the
>> arithmetic
>>>>>>>>> stuff,
>>>>>>>>>>>> which I'd have to pattern match and undo first before
>> translating.
>>>>>>>>>>>> 
>>>>>>>>>>>> So the problem I have is that I want to disable convertExtract,
>>>>> but I
>>>>>>>>>>> don't
>>>>>>>>>>>> see a way to do that or to swap out the convertlet table.
>>>>>>>>>>>> 
>>>>>>>>>>>> The code I'm using to set up a connection is:
>>>>>>>>>>>> 
>>>>>>>>>>>> public CalciteConnection createCalciteConnection(
>>>>>>>>>>>> final DruidSchema druidSchema
>>>>>>>>>>>> ) throws SQLException
>>>>>>>>>>>> {
>>>>>>>>>>>> final Properties props = new Properties();
>>>>>>>>>>>> props.setProperty("caseSensitive", "true");
>>>>>>>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
>>>>>>>>>>>> final Connection connection =
>>>>>>>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
>>>>>>>>>>>> final CalciteConnection calciteConnection =
>>>>>>>>>>>> connection.unwrap(CalciteConnection.class);
>>>>>>>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
>>>>>>>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
>>>>>>>>>>> druidSchema);
>>>>>>>>>>>> return calciteConnection;
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> This CalciteConnection is then used by the Druid HTTP server to
>>>>>>> offer a
>>>>>>>>>>> SQL
>>>>>>>>>>>> API.
>>>>>>>>>>>> 
>>>>>>>>>>>> Is there some way to swap out the convertlet table that I'm
>>>>> missing?
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, just in general, am I going about this the right way? Is
>>>>> using
>>>>>>>>> the
>>>>>>>>>>>> JDBC driver the right way to embed Calcite? Or should I be
>> calling
>>>>>>> into
>>>>>>>>>>> it
>>>>>>>>>>>> at some lower level?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> 
>>>>>>>>>>>> Gian
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>>
Re: Embedding Calcite, adjusting convertlets

Reply via email to