Re: Calcite fullscan vs indexscan

Julian Hyde Tue, 11 Nov 2014 10:29:42 -0800

I think you’re going about it the right way. Index lookups are traditionally 
optimized by treating them as a join between two tables. You transform a 
lateral join (which is correlated) into a CorrelatorRel.


There isn’t an EnumeratorCorrelatorRel, and we didn’t include 
NestedLoopsJoinRule, because correlations require re-starts, and re-starts are 
no good for analytic queries, especially distributed ones. But what you suggest 
is totally valid. Also, there isn’t a version of EnumerableScan that takes 
correlating variables (in sargs) and without that, there’s no point in doing a 
re-start — you’d get the same results every time.

What you are doing with Eclipse MAT is a valid and very cool use of Calcite, 
but it needs different rules. How about creating your set of core rules as a 
data structure in Programs? Then we can write some tests for them and ensure 
that they continue to work together.

Julian


On Nov 9, 2014, at 7:03 AM, Vladimir Sitnikov <[email protected]> 
wrote:

> Hi,
> 
> I am having troubles implementing indexed accesses via Calcite.
> Can you please guide me?
> 
> Here's the problem statement:
> 1) I have "table full scans" working.
> 2) I want Calcite to transform joins into nested-loops with "lookup by
> id" inner loop.
> 
> Here's sample query: https://github.com/vlsi/mat-calcite-plugin#join-sample
> explain plan for
> select u."@ID", s."@RETAINED"
>   from "java.lang.String" s
>   join "java.net.URL" u
>     on (s."@ID" = get_id(u.path))
> 
> The "@ID" column is a primary key, so I want Calcite to generate the
> following plan: Filter(NestedLoops(Scan("java.net.URL" u),
> FetchObjectBy(get_id(u.path))), get_class(s)=="java.lang.String")
> 
> Current plan is just a join of two "full scans" :(
> 
> My "storage engine" is a java library (Eclipse Memory Analyzer in
> fact), thus the perfect generated code would be as follows:
> for(IObject url: snapshot.getObjectsByClass("java.net.URL")){
>  IObject path = (IObject) url.resolveValue("path");
>  pipe row(url.getObjectId(), path.getRetainedHeapSize()); // return results
> }
> 
> Here's what I did:
> 1) I found NestedLoopsJoinRule that seems to generate the required
> kind of plan. I have no idea why the rule is disabled by default.
> 2) However, I find no "EnumerableCorrelatorRel", thus it looks like I
> would get that "cannotplan" exception even if I create my
> CorellatorRel("@ID"=get_id) rule.
> 
> 3) Another my idea is to match JoinRel(MyRel, MyRel) and replace the
> second argument with a TableFunction, so the final plan would be
> Join(Scan("java.net.URL" u), TableFunction("getObject", get_id(u.path))
> Using table function machinery for retrieving a single row looks like
> an overkill.
> 
> This ends up in the following questions:
> 1) What is the suggested way to implement this kind of optimizations?
> 2) Why there is no such thing as EnumerableCorrelatorRel?
> 
> -- 
> Regards,
> Vladimir Sitnikov

Re: Calcite fullscan vs indexscan

Reply via email to