Re: Embedding Calcite, adjusting convertlets

Gian Merlino Wed, 14 Dec 2016 15:22:08 -0800

I don't think so, since my test query (SELECT dim1 FROM s.foo GROUP BY dim1
ORDER BY dim1 DESC) is sorting on a column that is also projected.


Gian

On Wed, Dec 14, 2016 at 11:58 AM, Julian Hyde <[email protected]> wrote:

> Are you running into some variant of the problems that inspired
> https://issues.apache.org/jira/browse/CALCITE-819: <
> https://issues.apache.org/jira/browse/CALCITE-819:> at the root of the
> tree, columns that are not projected are removed, and if the desired sort
> order involves non-projected columns, the desired sort order is forgotten).
>
> > On Dec 14, 2016, at 11:19 AM, Gian Merlino <[email protected]> wrote:
> >
> > Ah, thanks. So if that sort of thing is not a smoking gun, do you have an
> > idea about where I should look next? If not I'll keep poking around.
> >
> > Gian
> >
> > On Wed, Dec 14, 2016 at 11:06 AM, Julian Hyde <[email protected]> wrote:
> >
> >>> - But its "set" field points to a RelSet with "rels" that _don't_ have
> >>> _any_ collation traits.
> >>
> >> That’s OK. A “subset” (RelSubset) is a collection of RelNodes that are
> >> logically and physically equivalent (same results, same physical
> >> properties) whereas a “set” (RelSet) is a collection of RelNodes that
> are
> >> logically equivalent.
> >>
> >> A set can therefore be considered to be a collection of subsets, each of
> >> which contains RelNodes. And it used to be implemented that way, but in
> >> https://issues.apache.org/jira/browse/CALCITE-88 <
> >> https://issues.apache.org/jira/browse/CALCITE-88> we introduced
> collation
> >> as a trait, and that made subsets non-disjoint (a RelNode can be sorted
> on
> >> (x, y), and also on (x), and also on (), and also on (z)) so we made
> >> RelSubset just a view onto a RelSet, filtering the list of RelNodes
> >> according to the ones that have (“subsume”) the desired traits.
> >>
> >> Julian
> >>
> >>
> >>> On Dec 14, 2016, at 10:45 AM, Gian Merlino <[email protected]> wrote:
> >>>
> >>> I spent some more time looking into (3) and found that when I had
> things
> >>> going through the Planner rather than the JDBC driver, SortRemoveRule
> was
> >>> removing sorts when it shouldn't have been. This happens even for
> simple
> >>> queries like "SELECT dim1 FROM s.foo GROUP BY dim1 ORDER BY dim1
> >>> DESC". Removing SortRemoveRule from the planner fixed the broken tests
> on
> >>> my end.
> >>>
> >>> I dug into that a bit and saw that the call to
> "convert(sort.getInput(),
> >>> traits)" in SortRemoveRule was returning a RelSubset that looked a bit
> >>> funny in the debugger:
> >>>
> >>> - The RelSubset's "traitSet" _does_ have the proper collation trait.
> >>> - But its "set" field points to a RelSet with "rels" that _don't_ have
> >>> _any_ collation traits.
> >>>
> >>> From what I understand that causes Calcite to treat the unsorted and
> >> sorted
> >>> rels as equivalent when they in fact aren't. I'm still not sure if this
> >> is
> >>> a Calcite bug or user error on my part… I'll keep looking into it
> unless
> >>> someone has any bright ideas.
> >>>
> >>> fwiw, my Planner construction looks like this:
> >>>
> >>>   final FrameworkConfig frameworkConfig = Frameworks
> >>>       .newConfigBuilder()
> >>>       .parserConfig(
> >>>           SqlParser.configBuilder()
> >>>                    .setCaseSensitive(true)
> >>>                    .setUnquotedCasing(Casing.UNCHANGED)
> >>>                    .build()
> >>>       )
> >>>       .defaultSchema(rootSchema)
> >>>       .traitDefs(ConventionTraitDef.INSTANCE,
> >>> RelCollationTraitDef.INSTANCE)
> >>>       .programs(Programs.ofRules(myRules))
> >>>       .executor(new RexExecutorImpl(Schemas.createDataContext(null)))
> >>>       .context(Contexts.EMPTY_CONTEXT)
> >>>       .build();
> >>>
> >>>   return Frameworks.getPlanner(frameworkConfig);
> >>>
> >>> Gian
> >>>
> >>> On Sat, Dec 3, 2016 at 5:53 PM, Gian Merlino <[email protected]> wrote:
> >>>
> >>>> Sure, I added those first two to the ticket.
> >>>>
> >>>> I don't think those are happening with (3) but I'll double check next
> >> time
> >>>> I take a look at using the Planner.
> >>>>
> >>>> Gian
> >>>>
> >>>> On Fri, Dec 2, 2016 at 12:20 PM, Julian Hyde <[email protected]>
> wrote:
> >>>>
> >>>>> Can you please add (1) and (2) to https://issues.apache.org/jira
> >>>>> /browse/CALCITE-1525 <https://issues.apache.org/jir
> >> a/browse/CALCITE-1525>,
> >>>>> which deals with the whole issue of using “Planner” within the JDBC
> >> driver,
> >>>>> so we can be consistent.
> >>>>>
> >>>>> (3) doesn’t look likely to be related. Do your queries have UNION or
> >>>>> other set-ops? Are you sorting on columns that do not appear in the
> >> final
> >>>>> result?
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>
> >>>>>> On Nov 28, 2016, at 10:45 AM, Gian Merlino <[email protected]> wrote:
> >>>>>>
> >>>>>> I traveled a bit down the Frameworks/Planner road and got most of my
> >>>>> tests
> >>>>>> passing, but ran into some problems getting them all to work:
> >>>>>>
> >>>>>> (1) "EXPLAIN PLAN FOR" throws NullPointerException during
> >>>>> Planner.validate.
> >>>>>> It looks like CalcitePrepareImpl has some special code to handle
> >>>>> validation
> >>>>>> of EXPLAIN, but PlannerImpl doesn't. I'm not sure if this is
> >> something I
> >>>>>> should be doing on my end, or if it's a bug in PlannerImpl.
> >>>>>> (2) I don't see a way to do ?-style prepared statements with bound
> >>>>>> variables, which _is_ possible with the JDBC driver route.
> >>>>>> (3) Not sure why this is happening, but for some reason ORDER BY /
> >> LIMIT
> >>>>>> clauses are getting ignored sometimes, even when they work with the
> >> JDBC
> >>>>>> driver route. This may be something messed up with my rules though
> and
> >>>>> may
> >>>>>> not be Calcite's fault.
> >>>>>>
> >>>>>> Julian, do any of these look like bugs that should be raised in
> jira,
> >> or
> >>>>>> are they just stuff I should be dealing with on my side?
> >>>>>>
> >>>>>> Btw, I do like that the Frameworks/Planner route gives me back the
> >>>>> RelNode
> >>>>>> itself, since that means I can make the Druid queries directly
> without
> >>>>>> needing to go through the extra layers of the JDBC driver. That part
> >> is
> >>>>>> nice.
> >>>>>>
> >>>>>> Gian
> >>>>>>
> >>>>>> On Wed, Nov 23, 2016 at 10:11 PM, Julian Hyde <[email protected]>
> >> wrote:
> >>>>>>
> >>>>>>> I don’t know how it’s used outside Calcite. Maybe some others can
> >>>>> chime in.
> >>>>>>>
> >>>>>>> Thanks for the PR. I logged https://issues.apache.org/jira
> >>>>>>> /browse/CALCITE-1509 <https://issues.apache.org/jir
> >>>>> a/browse/CALCITE-1509>
> >>>>>>> for it, and will commit shortly.
> >>>>>>>
> >>>>>>> Julian
> >>>>>>>
> >>>>>>>> On Nov 23, 2016, at 12:32 PM, Gian Merlino <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Do you know examples of projects that use Planner or PlannerImpl
> >>>>>>> currently
> >>>>>>>> (from "outside")? As far as I can tell, within Calcite itself it's
> >>>>> only
> >>>>>>>> used in test code. Maybe that'd be a better entry point.
> >>>>>>>>
> >>>>>>>> In the meantime I raised a PR here for allowing a convertlet table
> >>>>>>> override
> >>>>>>>> in a CalcitePrepareImpl: https://github.com/apache/
> calcite/pull/330
> >> .
> >>>>>>> That
> >>>>>>>> was enough to get the JDBC driver on my end to behave how I want
> it
> >>>>> to.
> >>>>>>>>
> >>>>>>>> Gian
> >>>>>>>>
> >>>>>>>> On Thu, Nov 17, 2016 at 5:23 PM, Julian Hyde <[email protected]>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I was wrong earlier… FrameworkConfig already has a
> >> getConvertletTable
> >>>>>>>>> method. But regarding using FrameworkConfig from within the JDBC
> >>>>> driver,
> >>>>>>>>> It’s complicated. FrameworkConfig only works if you are “outside”
> >>>>>>> Calcite,
> >>>>>>>>> whereas CalcitePrepare is when you are customizing from the
> inside,
> >>>>> and
> >>>>>>>>> sadly CalcitePrepare does not use a FrameworkConfig.
> >>>>>>>>>
> >>>>>>>>> Compare and contrast:
> >>>>>>>>> * CalcitePrepareImpl.getSqlToRelConverter [
> >>>>> https://github.com/apache/
> >>>>>>>>> calcite/blob/3f92157d5742dd10f3b828d22d7a753e0a2899cc/core/
> >>>>>>> src/main/java/
> >>>>>>>>> org/apache/calcite/prepare/CalcitePrepareImpl.java#L1114 <
> >>>>>>>>> https://github.com/apache/calcite/blob/3f92157d5742dd10f3b82
> >>>>> 8d22d7a75
> >>>>>>>>> 3e0a2899cc/core/src/main/java/org/apache/calcite/prepare/
> >>>>>>>>> CalcitePrepareImpl.java#L1114> ]
> >>>>>>>>> * PlannerImpl.rel [ https://github.com/apache/calcite/blob/
> >>>>>>>>> 105bba1f83cd9631e8e1211d262e4886a4a863b7/core/src/main/java/
> >>>>>>>>> org/apache/calcite/prepare/PlannerImpl.java#L225 <
> >>>>>>>>> https://github.com/apache/calcite/blob/105bba1f83cd9631e8e12
> >>>>> 11d262e48
> >>>>>>>>> 86a4a863b7/core/src/main/java/org/apache/calcite/prepare/
> >>>>>>>>> PlannerImpl.java#L225> ]
> >>>>>>>>>
> >>>>>>>>> The latter uses a convertletTable sourced from a FrameworkConfig.
> >>>>>>>>>
> >>>>>>>>> The ideal thing would be to get CalcitePrepareImpl to use a
> >>>>> PlannerImpl
> >>>>>>> to
> >>>>>>>>> do its dirty work. Then “inside” and “outside” would work the
> same.
> >>>>>>> Would
> >>>>>>>>> definitely appreciate that as a patch.
> >>>>>>>>>
> >>>>>>>>> If you choose to go the JDBC driver route, you could override
> >>>>>>>>> Driver.createPrepareFactory to produce a sub-class of
> >> CalcitePrepare
> >>>>>>> that
> >>>>>>>>> works for your environment, one with an explicit convertletTable
> >>>>> rather
> >>>>>>>>> than just using the default.
> >>>>>>>>>
> >>>>>>>>> Julian
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Nov 17, 2016, at 5:01 PM, Gian Merlino <[email protected]>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey Julian,
> >>>>>>>>>>
> >>>>>>>>>> If the convertlets were customizable with a FrameworkConfig, how
> >>>>> would
> >>>>>>> I
> >>>>>>>>>> use that configure the JDBC driver (given that I'm doing it with
> >> the
> >>>>>>> code
> >>>>>>>>>> upthread)? Or would that suggest using a different approach to
> >>>>>>> embedding
> >>>>>>>>>> Calcite?
> >>>>>>>>>>
> >>>>>>>>>> Gian
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Nov 17, 2016 at 4:02 PM, Julian Hyde <[email protected]>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Convertlets have a similar effect to planner rules (albeit they
> >>>>> act on
> >>>>>>>>>>> scalar expressions, not relational expressions) so people
> should
> >> be
> >>>>>>>>> able to
> >>>>>>>>>>> change the set of active convertlets.
> >>>>>>>>>>>
> >>>>>>>>>>> Would you like to propose a change that makes the convertlet
> >> table
> >>>>>>>>>>> pluggable? Maybe as part of FrameworkConfig? Regardless, please
> >>>>> log a
> >>>>>>>>> JIRA
> >>>>>>>>>>> to track this.
> >>>>>>>>>>>
> >>>>>>>>>>> And by the way, RexImpTable, which defines how operators are
> >>>>>>> implemented
> >>>>>>>>>>> by generating java code, should also be pluggable. It’s been on
> >> my
> >>>>>>> mind
> >>>>>>>>> for
> >>>>>>>>>>> a long time to allow the “engine” — related to the data format,
> >> and
> >>>>>>> how
> >>>>>>>>>>> code is generated to access fields and evaluate expressions and
> >>>>>>>>> operators —
> >>>>>>>>>>> to be pluggable.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding whether the JDBC driver is the right way to embed
> >>>>> Calcite.
> >>>>>>>>>>> There’s no easy answer. You might want to embed Calcite as a
> >>>>> library
> >>>>>>> in
> >>>>>>>>>>> your own server (as Drill and Hive do). Or you might want to
> make
> >>>>>>>>> yourself
> >>>>>>>>>>> just an adapter that runs inside a Calcite JDBC server (as the
> >> CSV
> >>>>>>>>> adapter
> >>>>>>>>>>> does). Or something in the middle, like what Phoenix does:
> using
> >>>>>>> Calcite
> >>>>>>>>>>> for JDBC, SQL, planning, but with your own metadata and runtime
> >>>>>>> engine.
> >>>>>>>>>>>
> >>>>>>>>>>> As long as you build the valuable stuff into planner rules, new
> >>>>>>>>> relational
> >>>>>>>>>>> operators (if necessary) and use the schema SPI, you should be
> >>>>> able to
> >>>>>>>>>>> change packaging in the future.
> >>>>>>>>>>>
> >>>>>>>>>>> Julian
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Nov 17, 2016, at 1:59 PM, Gian Merlino <[email protected]>
> >> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hey Calcites,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm working on embedding Calcite into Druid (http://druid.io/
> ,
> >>>>>>>>>>>> https://github.com/druid-io/druid/pull/3682) and am running
> >> into
> >>>>> a
> >>>>>>>>>>> problem
> >>>>>>>>>>>> that is making me wonder if the approach I'm using makes
> sense.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Consider the expression EXTRACT(YEAR FROM __time). Calcite
> has a
> >>>>>>>>> standard
> >>>>>>>>>>>> convertlet rule "convertExtract" that changes this into some
> >>>>>>> arithmetic
> >>>>>>>>>>> on
> >>>>>>>>>>>> __time casted to an int type. But Druid has some builtin
> >>>>> functions to
> >>>>>>>>> do
> >>>>>>>>>>>> this, and I'd rather use those than arithmetic (for a bunch of
> >>>>>>>>> reasons).
> >>>>>>>>>>>> Ideally, in my RelOptRules that convert Calcite rels to Druid
> >>>>>>> queries,
> >>>>>>>>>>> I'd
> >>>>>>>>>>>> see the EXTRACT as a normal RexCall with the time flag and an
> >>>>>>>>> expression
> >>>>>>>>>>> to
> >>>>>>>>>>>> apply it to. That's a lot easier to translate than the
> >> arithmetic
> >>>>>>>>> stuff,
> >>>>>>>>>>>> which I'd have to pattern match and undo first before
> >> translating.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So the problem I have is that I want to disable
> convertExtract,
> >>>>> but I
> >>>>>>>>>>> don't
> >>>>>>>>>>>> see a way to do that or to swap out the convertlet table.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The code I'm using to set up a connection is:
> >>>>>>>>>>>>
> >>>>>>>>>>>> public CalciteConnection createCalciteConnection(
> >>>>>>>>>>>> final DruidSchema druidSchema
> >>>>>>>>>>>> ) throws SQLException
> >>>>>>>>>>>> {
> >>>>>>>>>>>> final Properties props = new Properties();
> >>>>>>>>>>>> props.setProperty("caseSensitive", "true");
> >>>>>>>>>>>> props.setProperty("unquotedCasing", "UNCHANGED");
> >>>>>>>>>>>> final Connection connection =
> >>>>>>>>>>>> DriverManager.getConnection("jdbc:calcite:", props);
> >>>>>>>>>>>> final CalciteConnection calciteConnection =
> >>>>>>>>>>>> connection.unwrap(CalciteConnection.class);
> >>>>>>>>>>>> calciteConnection.getRootSchema().setCacheEnabled(false);
> >>>>>>>>>>>> calciteConnection.getRootSchema().add(DRUID_SCHEMA_NAME,
> >>>>>>>>>>> druidSchema);
> >>>>>>>>>>>> return calciteConnection;
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> This CalciteConnection is then used by the Druid HTTP server
> to
> >>>>>>> offer a
> >>>>>>>>>>> SQL
> >>>>>>>>>>>> API.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is there some way to swap out the convertlet table that I'm
> >>>>> missing?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also, just in general, am I going about this the right way? Is
> >>>>> using
> >>>>>>>>> the
> >>>>>>>>>>>> JDBC driver the right way to embed Calcite? Or should I be
> >> calling
> >>>>>>> into
> >>>>>>>>>>> it
> >>>>>>>>>>>> at some lower level?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Gian
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: Embedding Calcite, adjusting convertlets

Reply via email to