Re: RelFieldTrimmer not optimally trimming after filters under joins?

Julian Hyde Tue, 11 Mar 2025 12:45:34 -0700

Probably this should be done by RelFieldTrimmer, but it should be a mode 
(boolean flag). If it started adding a bunch of new Project operators it might 
break existing clients.


Maybe even several modes. Some people might not to add Project under an 
Aggregate (because Aggregate implicitly projects). Some people might not want 
to add a Project under a Filter under a Project (because the Project and Filter 
are destined to become a Calc). And so forth.

Adding a Project to the inputs of a distinct (not-all) Union is valid only if 
you know about functional dependencies or keys.

> On Mar 7, 2025, at 10:36 AM, Mihai Budiu <mbu...@gmail.com> wrote:
> 
> This sounds like a very useful transformation.
> Are you considering contributing this in some way as a utility function?
> 
> Mihai
> 
> ________________________________
> From: Ian Bertolacci <ian.bertola...@workday.com.INVALID>
> Sent: Tuesday, March 4, 2025 2:50 PM
> To: dev@calcite.apache.org <dev@calcite.apache.org>
> Subject: Re: RelFieldTrimmer not optimally trimming after filters under joins?
> 
> I just hacked together an override where it will build a redundant project on 
> each side if necessary.
> That should eliminate any overhead of invoking any planners or rules.
> (For our needs, additional projects have not performance implications)
> -Ian
> 
> From: Ian Bertolacci <ian.bertola...@workday.com.INVALID>
> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> Date: Tuesday, March 4, 2025 at 14:25
> To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> Subject: Re: RelFieldTrimmer not optimally trimming after filters under joins?
> 
>> I think you could work around this by always inserting trivial projects over 
>> every node in the tree before trimming, and then clean up with 
>> ProjectRemoveRule. This is pretty much exactly what I was doing. Good to 
>> know that I’m not wildly
> 
> 
>> I think you could work around this by always inserting trivial projects over 
>> every node in the tree before trimming, and then clean up with 
>> ProjectRemoveRule.
> 
> 
> 
> This is pretty much exactly what I was doing.
> 
> Good to know that I’m not wildly off-track
> 
> Thanks!
> 
> -Ian
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Steven Phillips <ste...@dremio.com.INVALID>
> 
> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> 
> Date: Tuesday, March 4, 2025 at 13:55
> 
> To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> 
> Subject: Re: RelFieldTrimmer not optimally trimming after filters under joins?
> 
> 
> 
> In think this is a current limitation of FieldTrimmer. The Join and Filter 
> nodes can't drop columns (since they don't carry column selection 
> information), and the trimmer doesn't add Project nodes (currently). I have 
> worked around this limitation
> 
> 
> 
> 
> 
> In think this is a current limitation of FieldTrimmer. The Join and Filter
> 
> 
> 
> nodes can't drop columns (since they don't carry column selection
> 
> 
> 
> information), and the trimmer doesn't add Project nodes (currently). I have
> 
> 
> 
> worked around this limitation by using HepPlanner with various
> 
> 
> 
> ProjectTranspose rules.
> 
> 
> 
> 
> 
> 
> 
> I think you could work around this by always inserting trivial projects
> 
> 
> 
> over every node in the tree before trimming, and then clean up with
> 
> 
> 
> ProjectRemoveRule.
> 
> 
> 
> 
> 
> 
> 
> On Tue, Mar 4, 2025 at 1:33 PM Ian Bertolacci
> 
> 
> 
> <ian.bertola...@workday.com.invalid> wrote:
> 
> 
> 
> 
> 
> 
> 
>> I’m looking at using RelFieldTrimmer, and I’m noticing that if a side of a
> 
> 
> 
>> join has unnecessary fields after a filter, there is no trim-fields project
> 
> 
> 
>> on that side to reduce the width of the row.
> 
> 
> 
>> Is this expected, or is there a configuration or pre-processing step that
> 
> 
> 
>> I am missing?
> 
> 
> 
>> 
> 
> 
> 
>> For example, starting with this tree (these all look better in monospace,
> 
> 
> 
>> hopefully the formatting comes through)
> 
> 
> 
>> 4:Project(C5633_14509=[$4], C5633_486=[$8])
> 
> 
> 
>> └── 3:Join(condition=[=($1, $6)], joinType=[inner])
> 
> 
> 
>> ....├── 1:Filter(condition=[<($2, 10)])
> 
> 
> 
>> ....│...└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> 
> 
> 
>> ....└── 2:TableScan(table=[T895], Schema=[...64 fields...])
> 
> 
> 
>> 
> 
> 
> 
>> The result of RelFieldTrimmer is this:
> 
> 
> 
>> 9:Project(C5633_14509=[$2], C5633_486=[$4])
> 
> 
> 
>> └── 8:Join(condition=[=($0, $3)], joinType=[inner])
> 
> 
> 
>> ....├── 6:Filter(condition=[<($1, 10)])
> 
> 
> 
>> ....│...└── 5:Project(C5633_14505=[$1], C5633_14506=[$2], C5633_14509=[$4])
> 
> 
> 
>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> 
> 
> 
>> ....└── 7:Project(ID=[$0], C5633_486=[$2])
> 
> 
> 
>> ........└── 2:TableScan(table=[T895], Schema=[...64 fields...])
> 
> 
> 
>> 
> 
> 
> 
>> Notice: $1 on the LHS of the node is not used *after* the filter so a
> 
> 
> 
>> projection of only the $0 and $2 fields would be reduce the width of the
> 
> 
> 
>> row before the join.
> 
> 
> 
>> 
> 
> 
> 
>> However, I can force the insertion of a projection which is simply the
> 
> 
> 
>> identity (ie, projecting all fields of the input row with now additions or
> 
> 
> 
>> subtractions):
> 
> 
> 
>> 5:Project(C5633_14509=[$4], C5633_486=[$8])
> 
> 
> 
>> └── 4:Join(condition=[=($1, $6)], joinType=[inner])
> 
> 
> 
>> ....├── 2:Project(...Identity mapping, 6 fields...)
> 
> 
> 
>> ....│...└── 1:Filter(condition=[<($2, 10)])
> 
> 
> 
>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> 
> 
> 
>> ....└── 3:TableScan(table=[T895], Schema=[...64 fields...])
> 
> 
> 
>> 
> 
> 
> 
>> And the result is a projection wich only has the 2 fields necessary after
> 
> 
> 
>> the filter.
> 
> 
> 
>> 11:Project(C5633_14509=[$1], C5633_486=[$3])
> 
> 
> 
>> └── 10:Join(condition=[=($0, $2)], joinType=[inner])
> 
> 
> 
>> ....├── 8:Project(C5633_14505=[$0], C5633_14509=[$2]) <- trimmed
> 
> 
> 
>> ....│...└── 7:Filter(condition=[<($1, 10)])
> 
> 
> 
>> ....│.......└── 6:Project(C5633_14505=[$1], C5633_14506=[$2],
> 
> 
> 
>> C5633_14509=[$4])
> 
> 
> 
>> ....│...........└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> 
> 
> 
>> ....└── 9:Project(ID=[$0], C5633_486=[$2])
> 
> 
> 
>> ........└── 3:TableScan(table=[T895], Schema=[...64 fields...])
> 
> 
> 
>> 
> 
> 
> 
>> Thanks!
> 
> 
> 
>> -Ian
> 
> 
> 
>>

Re: RelFieldTrimmer not optimally trimming after filters under joins?

Reply via email to