Any further ideas of what might be going on or suggestions on how to debug this? I can put together a sample of the code I have so far if anyone is willing to spend a bit of time looking through it (~300 LOC). Thanks!
-- Michael Mior [email protected] 2016-06-13 14:49 GMT-04:00 Michael Mior <[email protected]>: > Julian, > > I understand that MultiJoin cannot be implemented directly. I'm only using > it to collect information on multiple joins into a single node since it > makes further steps of the algorithm much easier. Ultimate the rewrite rule > is attempting to match a LogicalProject on top of a LogicalJoin of two > TableScans and convert it into a LogicalProject on top of a TableScan of > the materialized view. The MultiJoin is only used internally within the > rule and I never actually perform any transformation using the MultiJoin. > The only transformation I am attempting is to replace a Join with a > TableScan. This is what is failing. > > -- > Michael Mior > [email protected] > > 2016-06-13 14:44 GMT-04:00 Julian Hyde <[email protected]>: > >> Using MultiJoin is a good idea. But remember that it was designed with >> only one, very specific purpose: to gather joins to be optimized by a >> greedy algorithm. >> >> MultiJoin cannot be implemented directly. In fact the only rule that can >> digest it and convert it into regular Join nodes is LoptOptimizeJoinRule. >> That might explain why you cannot find a plan. >> >> Julian >> >> >> > On Jun 12, 2016, at 11:33 AM, Michael Mior <[email protected]> wrote: >> > >> > I'm close to getting something working for the case of two tables. I >> don't >> > think it should be too challenging to extend to multiple tables. The >> gist >> > of the approach is to match a Join over two TableScans and then use >> > JoinToMultiJoin rule to get a MultiJoin instance. This is useful because >> > it's much easier to deal with with join conditions across more than two >> > tables with a MultiJoin than several join instances. From there, the >> plan >> > is to find a materialized view with matching conditions and rewrite the >> > original to join to a LogicalProject over a TableScan over the >> materialized >> > view. >> > >> > To test this out, I created two instances of a new class called >> DummyTable >> > with infinite cost and created a materialized view over these tables. I >> > hoped this would be helpful for debugging because it forces Calcite to >> use >> > the materialized view to execute the query (since actually performing >> scans >> > over these tables has infinite cost). >> > >> > The matching and substitution part seems to be working great, but >> Calcite >> > still fails to find a plan for the query. Any help would be appreciated. >> > Below are some relevant components of different parts of the plan. >> (Sorry, >> > they're quite long). Thanks! >> > >> > >> > >> > >> > >> > >> > Original query matched by the new rule >> > >> > LogicalProject(following=[$5], time=[$1], tweet_id=[$0]): rowcount = >> > 1.7976931348623157E308, cumulative cost = {1.7976931348623157E308 >> > rows, Infinity cpu, 0.0 io}, id = 641 >> > LogicalJoin(subset=[rel#636:Subset#22.NONE.[]], condition=[=($5, $3)], >> > joinType=[inner]): rowcount = 1.7976931348623157E308, cumulat >> > ive cost = {1.7976931348623157E308 rows, 0.0 cpu, 0.0 io}, id = 634 >> > DummyScan(subset=[rel#33:Subset#0.ENUMERABLE.[]], table=[[twissandra, >> > tweet]]): rowcount = 1.7976931348623157E308, cu >> > mulative cost = {inf}, id = 0 >> > DummyScan(subset=[rel#98:Subset#8.ENUMERABLE.[]], table=[[twissandra, >> > followers]]): rowcount = 1.7976931348623157E308 >> > , cumulative cost = {inf}, id = 18 >> > >> > Original query rewritten to use a MultiJoin >> > >> > LogicalProject(following=[$3], time=[$1], tweet_id=[$0]): rowcount = >> 1.0, >> > cumulative cost = {inf}, id = 667 >> > MultiJoin(joinFilter=[=($3, $2)], isFullOuterJoin=[false], >> > joinTypes=[[INNER, INNER]], outerJoinConditions=[[NULL, NULL]], >> projField >> > s=[[{0, 1}, {0}]]): rowcount = 1.0, cumulative cost = {inf}, id = 664 >> > LogicalProject(tweet_id=[$0], time=[$1], username=[$3]): rowcount = >> > 1.7976931348623157E308, cumulative cost = {inf}, id = 651 >> > DummyScan(table=[[twissandra, tweet]]): rowcount = >> > 1.7976931348623157E308, cumulative cost = {inf}, id = 0 >> > LogicalProject(following=[$1]): rowcount = 1.7976931348623157E308, >> > cumulative cost = {inf}, id = 652 >> > DummyScan(table=[[twissandra, followers]]): rowcount = >> > 1.7976931348623157E308, cumulative cost = {inf}, id = 18 >> > >> > Replacement scan over the materialized view (this is what is >> substituted in >> > the original query) >> > >> > LogicalProject(following=[$1], time=[$2], tweet_id=[$0]): rowcount = >> 100.0, >> > cumulative cost = {300.0 rows, 701.0 cpu, 0.0 io}, id = 69 >> > 3 >> > LogicalProject(username=[$0], time=[CAST($1):TIMESTAMP(0)], >> > tweet_id=[CAST($2):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-88 >> > 59-1$en_US$primary"]): rowcount = 100.0, cumulative cost = {200.0 rows, >> > 401.0 cpu, 0.0 io}, id = 44 >> > CassandraTableScan(table=[[twissandra, timeline]]): rowcount = 100.0, >> > cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 23 >> > >> > -- >> > Michael Mior >> > [email protected] <mailto:[email protected]> >> > >> > 2016-06-07 1:39 GMT-04:00 Amogh Margoor <[email protected] <mailto: >> [email protected]>>: >> > >> >> Sorry, I missed Lattices as I didn't know about Lattices without >> >> aggregations. Btw there is this rule where we try to replace star table >> >> scan by Tiles of Lattices with filters: >> >> >> >> >> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java >> < >> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/com/qubole/quark/planner/FilterAggStarRule.java >> > >> >> I thought it might be useful reference if we are trying to optimise for >> >> materialised views defined on joins and filters( e.g., MV defined by >> >> "select * from A join B on A.id=B.aid where A.x >10 and B.y < >> '2015-01-10'" >> >> ) >> >> >> >> Regards, >> >> Amogh >> >> >> >> On Tue, Jun 7, 2016 at 7:06 AM, Julian Hyde <[email protected] <mailto: >> [email protected]>> wrote: >> >> >> >>> Yes, I think you could make Lattices work without aggregation, as >> long as >> >>> you keep the key assumption, which is that dropping joined tables does >> >> not >> >>> change the number of rows (so, primary key and foreign key integrity, >> >>> relationships are all many-to-one from the root (fact) table, and you >> >> have >> >>> to include the root table in your query). >> >>> >> >>> I might end up using lattices with and without aggregation in the >> Druid >> >>> adapter. Because Druid cannot do joins, I plan to map each Druid table >> >> to a >> >>> lattice, so that you can satisfy a star query using a Druid table. >> Druid >> >> is >> >>> *mainly* for aggregation, but Druid just introduced a “select” >> query[1] >> >>> that returns raw rows, and the Druid adapter can execute >> non-aggregation >> >>> queries. >> >>> >> >>> Once you’ve translated a query "A join B join C” onto a virtual table >> ABC >> >>> I suppose you could then apply other materialized view techniques. >> >>> >> >>> If you were able to implement the Goldstein & Larson algorithm it >> would >> >> be >> >>> interesting to compose it with other materialized view substitutions >> in >> >>> Calcite. After all, each materialized view substitution just generates >> >> more >> >>> alternative queries to optimize, so I suspect that we could compose >> >>> algorithms. >> >>> >> >>> Julian >> >>> >> >>> [1] http://druid.io/docs/latest/querying/select-query.html < >> >>> http://druid.io/docs/latest/querying/select-query.html < >> http://druid.io/docs/latest/querying/select-query.html>> >> >>> >> >>> >> >>>> On Jun 6, 2016, at 6:06 PM, Michael Mior <[email protected]> wrote: >> >>>> >> >>>> Thanks for the pointer. Any chance there's a good example somewhere >> >> using >> >>>> lattices that would help me get something working? Also, is it >> possible >> >>> to >> >>>> have lattices with no aggregation? Unfortunately either way it >> doesn't >> >>>> completely suit my use case since I'm going to have to do rewriting >> on >> >>>> joins aside from star queries. That said if I can get basic equijoins >> >>>> working that would be enough for me. >> >>>> >> >>>> I started trying to write a unification rule but decided it would be >> >>> easier >> >>>> to do something like MaterializedViewFilterScanRule. The first >> problem >> >>> that >> >>>> I've run into is trying to get a Join instance in a rule where the >> left >> >>> and >> >>>> right inputs are TableScan instances. Even if I have a >> >> RelOptRuleOperand >> >>>> that looks like the following I end up with a join whose left and >> right >> >>>> inputs are RelSubsets. >> >>>> >> >>>> operand(LogicalJoin.class, operand(TableScan.class, any()), >> >>>> operand(TableScan.class, any())) >> >>>> >> >>>> Anyway, I've been looking into implementing the approach from the >> >>> following >> >>>> paper. It's very nicely written and easy to follow. It also doesn't >> >>> require >> >>>> trying different join permutations. I'm starting with several >> >> additional >> >>>> restrictions (only equijoins, no aggregations, etc.) >> >>>> >> >>>> >> >>> >> >> >> ftp://ftp.cse.buffalo.edu/users/azhang/disc/SIGMOD/pdf-files/331/202-optimizing.pdf >> >>>> >> >>>> -- >> >>>> Michael Mior >> >>>> [email protected] >> >>>> >> >>>> 2016-06-06 19:34 GMT-04:00 Julian Hyde <[email protected]>: >> >>>> >> >>>>> Short answer: yes, if you use lattices. >> >>>>> >> >>>>> Remember, Calcite has two mechanisms for matching materialized >> views: >> >>> the >> >>>>> "general purpose" mechanism, and lattices. >> >>>>> >> >>>>> Using general purpose mechanism you can declare a materialized view >> >>> based >> >>>>> on any query, but there needs to be a unification rule to rewrite >> your >> >>>>> query onto that view. (That is, rewrite your query using >> >>>>> semantics-preserving algebraic transformations such that one branch >> of >> >>> the >> >>>>> rewritten query is identical to the query that defines the >> >> materialized >> >>>>> view. See >> http://www.slideshare.net/julianhyde/calcite-algebraedw2015 >> >> < >> >>>>> http://www.slideshare.net/julianhyde/calcite-algebraedw2015> slides >> >>> 15+.) >> >>>>> There are unification rules for scan, filter, project and some kinds >> >> of >> >>>>> aggregation, but not join. >> >>>>> >> >>>>> Knowing how difficult it was to write unification rules, and knowing >> >> how >> >>>>> common was the use case of star queries (scan - filter - join - >> >>> aggregate), >> >>>>> I invented lattices. First you define a lattice (a collection of >> >> tables >> >>>>> joined using many-to-one relationships), then you (or the system) >> >>> defines >> >>>>> materialized views on top of that lattice. If an incoming query uses >> >>> tables >> >>>>> from that lattice, joined using the same join keys, then Calcite >> will >> >>>>> consider substituting any materialized views in that lattice. >> >>>>> >> >>>>> Matching queries to lattice materialized views is much more >> efficient >> >>> than >> >>>>> matching to general purpose materialized views, two reasons. (a) The >> >>> joins >> >>>>> are removed, the materialized view and the query become just >> projects >> >> of >> >>>>> the lattice’s columns, and we avoid the combinatorial explosion that >> >>> occurs >> >>>>> when joins are permuted. (b) A lattice can contain dozens of >> >>> materialized >> >>>>> views but it is clear, because they are partially ordered, which is >> >> the >> >>>>> best for a given query, and they are all considered at the same >> time. >> >>>>> >> >>>>> Julian >> >>>>> >> >>>>>> On Jun 5, 2016, at 18:58, [email protected] wrote: >> >>>>>> >> >>>>>> AFAIK it doesn't. Not sure if some work has been done towards it. >> >>>>>> Sent from my iPhone >> >>>>>> >> >>>>>>> On 06-Jun-2016, at 7:08 AM, Michael Mior <[email protected]> >> >> wrote: >> >>>>>>> >> >>>>>>> Am I correct in understanding that Calcite doesn't currently >> rewrite >> >>>>>>> queries to use materialized views if the query which defines the >> >> view >> >>>>>>> includes a join? If this is the case, has there been any work >> >> towards >> >>>>>>> making this happen? >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Michael Mior >> >>>>>>> [email protected] >> >> >
