Re: Using materialized view rewrites to route queries to different backend engines

Anupam Aggarwal Sat, 05 Jan 2019 18:46:14 -0800

Thanks Michael !

I had looked at the notebook earlier - the starting point in the
notebook for the conversion to an optimized plan was from a relNode (I
was having a hard time to figure out how to convert sql to relNode
without going through the jdbc/prepare classes)


I was able to get it to work by replicating the processing happening
in the test cases although in a really hacky way (did so by modifying
calcite internals -making some methods public/changing code in
CalcitePrepareImpl etc )


If there's a cleaner way that would be super helpful :)

Here's the code

https://gist.github.com/anupamaggarwal/de09d94325ebd8e069151b1aa1bdbc23


In my case this query

 "select \"empid\",\"name\" from \"HR\".\"emps\" where \"deptno\" = 10";

gets rewritten to


SELECT *
FROM postgres.HRCLONE

Thanks so much!

Anupam







Would this example of query optimization using Calcite help?
https://github.com/michaelmior/calcite-notebooks/blob/master/query-optimization.ipynb

If not, sharing the code you have so far and specifically pointing out the
problem you're having would be helpful.

--
Michael [email protected]


Le ven. 4 janv. 2019 à 14:05, Anupam Aggarwal <[email protected]> a
écrit :

> Hi,
>
> I am trying to use calcite as a library/abstraction layer to front several
> DB backends and route queries  to the backend best suited for the query.
> For example for a table in hive if a subset of columns referenced in a
> query have been replicated in postgres we will reroute  the query to
> postgres (and having it execute there)
> (Note this is different than having calcite directly connect to postgres
> and push down / do stuff in memory) - In our case complete execution is
> delegated to the backend engine itself.
>
>
>
> My schema is roughly similar to the following
>
> {
>   "version": "1.0",
>   "defaultSchema": "hive",
>   "schemas": [
>     {
>       "type": "jdbc",
>       "name": "hive",
>       *//hive stuff is here*
>     },
>     {
>       "name":"postgres",
>       "materializations":[
>         {
>           "table":"bar",
>           "sql":""
>         }
>       ]
>     }
>   ]
>
> This helps us route queries of the form select * from hive.table to
> --> select * from postgres.bar.
>
> I was reading the code in MaterializationTest.java (unit test), The way
> this is currently done is through a jdbc:calcite connection (at connection
> time the materializations are defined) and at query execution time the
> materialized view gets used and substituted in the relnode tree. However
> the code seems tightly coupled to the jdbc calcite call.
>
> Is there any other way to get a relnode from a sql directly , call optimize
> on it to do the MV substitution and return the rewritten query (and avoid
> the jdbc call altogether)
> (I plugged some code using RelToSqlConverter in the main codepath, and I am
> able to log the rewritten query but can't think of a good way to return the
> rewritten query directly bypassing the jdbc call. In my case the result of
> execution of the query is going to be the rewrite of the query itself.
>
> I went through the dev list but wasn't able to locate a way (also checked
> out Qubole's Quark which does something similar although in a different
> way). Any pointers would be really helpful. Apologies if this is a really
> basic question
>
>
> Thanks
>
> Anupam
>






On Fri, Jan 4, 2019 at 11:53 PM Anupam Aggarwal <[email protected]>
wrote:

> Hi,
>
> I am trying to use calcite as a library/abstraction layer to front several
> DB backends and route queries  to the backend best suited for the query.
> For example for a table in hive if a subset of columns referenced in a
> query have been replicated in postgres we will reroute  the query to
> postgres (and having it execute there)
> (Note this is different than having calcite directly connect to postgres
> and push down / do stuff in memory) - In our case complete execution is
> delegated to the backend engine itself.
>
>
>
> My schema is roughly similar to the following
>
> {
>   "version": "1.0",
>   "defaultSchema": "hive",
>   "schemas": [
>     {
>       "type": "jdbc",
>       "name": "hive",
>       *//hive stuff is here*
>     },
>     {
>       "name":"postgres",
>       "materializations":[
>         {
>           "table":"bar",
>           "sql":""
>         }
>       ]
>     }
>   ]
>
> This helps us route queries of the form select * from hive.table to --> 
> select * from postgres.bar.
>
> I was reading the code in MaterializationTest.java (unit test), The way
> this is currently done is through a jdbc:calcite connection (at connection
> time the materializations are defined) and at query execution time the
> materialized view gets used and substituted in the relnode tree. However
> the code seems tightly coupled to the jdbc calcite call.
>
> Is there any other way to get a relnode from a sql directly , call
> optimize on it to do the MV substitution and return the rewritten query
> (and avoid the jdbc call altogether)
> (I plugged some code using RelToSqlConverter in the main codepath, and I
> am able to log the rewritten query but can't think of a good way to return
> the rewritten query directly bypassing the jdbc call. In my case the
> result of execution of the query is going to be the rewrite of the query
> itself.
>
> I went through the dev list but wasn't able to locate a way (also checked
> out Qubole's Quark which does something similar although in a different
> way). Any pointers would be really helpful. Apologies if this is a really
> basic question
>
>
> Thanks
>
> Anupam
>
>
>

Re: Using materialized view rewrites to route queries to different backend engines

Reply via email to