I think you’re going about it the right way. Index lookups are traditionally optimized by treating them as a join between two tables. You transform a lateral join (which is correlated) into a CorrelatorRel.
There isn’t an EnumeratorCorrelatorRel, and we didn’t include NestedLoopsJoinRule, because correlations require re-starts, and re-starts are no good for analytic queries, especially distributed ones. But what you suggest is totally valid. Also, there isn’t a version of EnumerableScan that takes correlating variables (in sargs) and without that, there’s no point in doing a re-start — you’d get the same results every time. What you are doing with Eclipse MAT is a valid and very cool use of Calcite, but it needs different rules. How about creating your set of core rules as a data structure in Programs? Then we can write some tests for them and ensure that they continue to work together. Julian On Nov 9, 2014, at 7:03 AM, Vladimir Sitnikov <[email protected]> wrote: > Hi, > > I am having troubles implementing indexed accesses via Calcite. > Can you please guide me? > > Here's the problem statement: > 1) I have "table full scans" working. > 2) I want Calcite to transform joins into nested-loops with "lookup by > id" inner loop. > > Here's sample query: https://github.com/vlsi/mat-calcite-plugin#join-sample > explain plan for > select u."@ID", s."@RETAINED" > from "java.lang.String" s > join "java.net.URL" u > on (s."@ID" = get_id(u.path)) > > The "@ID" column is a primary key, so I want Calcite to generate the > following plan: Filter(NestedLoops(Scan("java.net.URL" u), > FetchObjectBy(get_id(u.path))), get_class(s)=="java.lang.String") > > Current plan is just a join of two "full scans" :( > > My "storage engine" is a java library (Eclipse Memory Analyzer in > fact), thus the perfect generated code would be as follows: > for(IObject url: snapshot.getObjectsByClass("java.net.URL")){ > IObject path = (IObject) url.resolveValue("path"); > pipe row(url.getObjectId(), path.getRetainedHeapSize()); // return results > } > > Here's what I did: > 1) I found NestedLoopsJoinRule that seems to generate the required > kind of plan. I have no idea why the rule is disabled by default. > 2) However, I find no "EnumerableCorrelatorRel", thus it looks like I > would get that "cannotplan" exception even if I create my > CorellatorRel("@ID"=get_id) rule. > > 3) Another my idea is to match JoinRel(MyRel, MyRel) and replace the > second argument with a TableFunction, so the final plan would be > Join(Scan("java.net.URL" u), TableFunction("getObject", get_id(u.path)) > Using table function machinery for retrieving a single row looks like > an overkill. > > This ends up in the following questions: > 1) What is the suggested way to implement this kind of optimizations? > 2) Why there is no such thing as EnumerableCorrelatorRel? > > -- > Regards, > Vladimir Sitnikov
