Re: RelFieldTrimmer not optimally trimming after filters under joins?

Steven Phillips Tue, 04 Mar 2025 16:47:46 -0800

Yes, whether it's more efficient or not depends on engine. For columnar
engines, dropping the column is trivial and would be unambiguously better.
Another reason it's better to use rules and let the optimizer figure out
what's best.


On Tue, Mar 4, 2025 at 4:42 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> I see. RelFieldTrimmer could insert a Project right after the Filter, but
> in most calling conventions that plan would probably be less efficient. The
> field will be removed next time there is a Project or Aggregate.
>
> > On Mar 4, 2025, at 4:33 PM, Steven Phillips <ste...@dremio.com.invalid>
> wrote:
> >
> > Julian,
> > The input column $1 is needed for the filter condition on the node below
> > the join, and not needed for anything else above that. The join tells
> > the filter below that it doesn't need that field. But the filter itself
> > does not the field. And filters don't have the ability to remove fields
> > (i.e. the rowtype of a filter is always the same as its input), so it
> > returns a rowtype and mapping to the join above it that includes the
> field
> > that's not needed. Joins also don't have the ability to trim fields, so
> it
> > returns a rowtype and mapping to the node above that includes the field.
> So
> > no one is "wrongly" telling its input 'I needs all of your fields'.
> >
> > Contrast the situation where there is a Project on top of the Filter.
> Join
> > passes down to the Project that it doesn't need that column. Project
> passes
> > down to the Filter that it doesn't need that column. Filter does need it,
> > so it keeps, and returns a rowtype/mapping that includes the column.
> > Project doesn't need it, so it drops that field from the project
> > expression, and returns a rowtype/mapping that doesn't include the field.
> > and so on.
> >
> > On Tue, Mar 4, 2025 at 4:00 PM Julian Hyde <jhyde.apa...@gmail.com>
> wrote:
> >
> >> I don’t think I understand this conversation. RelFieldTrimmer is
> intended
> >> to be invoked on the whole tree. Each node, when invoking the trimmer on
> >> its input (child), tells the trimmer which of the fields of that input
> it
> >> actually uses. Now ‘which fields it actually uses’ is based on the
> fields
> >> that its consumer (parent) said that it was using.
> >>
> >> If fields are not being trimmed as expected, look for one node that is
> >> wrongly telling its input ‘I need all of your fields’.
> >>
> >> Julian
> >>
> >>
> >>> On Mar 4, 2025, at 2:50 PM, Ian Bertolacci <ian.bertola...@workday.com
> .invalid>
> >> wrote:
> >>>
> >>> I just hacked together an override where it will build a redundant
> >> project on each side if necessary.
> >>> That should eliminate any overhead of invoking any planners or rules.
> >>> (For our needs, additional projects have not performance implications)
> >>> -Ian
> >>>
> >>> From: Ian Bertolacci <ian.bertola...@workday.com.INVALID>
> >>> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> >>> Date: Tuesday, March 4, 2025 at 14:25
> >>> To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> >>> Subject: Re: RelFieldTrimmer not optimally trimming after filters under
> >> joins?
> >>>
> >>>> I think you could work around this by always inserting trivial
> projects
> >> over every node in the tree before trimming, and then clean up with
> >> ProjectRemoveRule. This is pretty much exactly what I was doing. Good to
> >> know that I’m not wildly
> >>>
> >>>
> >>>> I think you could work around this by always inserting trivial
> projects
> >> over every node in the tree before trimming, and then clean up with
> >> ProjectRemoveRule.
> >>>
> >>>
> >>>
> >>> This is pretty much exactly what I was doing.
> >>>
> >>> Good to know that I’m not wildly off-track
> >>>
> >>> Thanks!
> >>>
> >>> -Ian
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Steven Phillips <ste...@dremio.com.INVALID>
> >>>
> >>> Reply-To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> >>>
> >>> Date: Tuesday, March 4, 2025 at 13:55
> >>>
> >>> To: "dev@calcite.apache.org" <dev@calcite.apache.org>
> >>>
> >>> Subject: Re: RelFieldTrimmer not optimally trimming after filters under
> >> joins?
> >>>
> >>>
> >>>
> >>> In think this is a current limitation of FieldTrimmer. The Join and
> >> Filter nodes can't drop columns (since they don't carry column selection
> >> information), and the trimmer doesn't add Project nodes (currently). I
> have
> >> worked around this limitation
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> In think this is a current limitation of FieldTrimmer. The Join and
> >> Filter
> >>>
> >>>
> >>>
> >>> nodes can't drop columns (since they don't carry column selection
> >>>
> >>>
> >>>
> >>> information), and the trimmer doesn't add Project nodes (currently). I
> >> have
> >>>
> >>>
> >>>
> >>> worked around this limitation by using HepPlanner with various
> >>>
> >>>
> >>>
> >>> ProjectTranspose rules.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> I think you could work around this by always inserting trivial projects
> >>>
> >>>
> >>>
> >>> over every node in the tree before trimming, and then clean up with
> >>>
> >>>
> >>>
> >>> ProjectRemoveRule.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Mar 4, 2025 at 1:33 PM Ian Bertolacci
> >>>
> >>>
> >>>
> >>> <ian.bertola...@workday.com.invalid> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>> I’m looking at using RelFieldTrimmer, and I’m noticing that if a side
> >> of a
> >>>
> >>>
> >>>
> >>>> join has unnecessary fields after a filter, there is no trim-fields
> >> project
> >>>
> >>>
> >>>
> >>>> on that side to reduce the width of the row.
> >>>
> >>>
> >>>
> >>>> Is this expected, or is there a configuration or pre-processing step
> >> that
> >>>
> >>>
> >>>
> >>>> I am missing?
> >>>
> >>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>> For example, starting with this tree (these all look better in
> >> monospace,
> >>>
> >>>
> >>>
> >>>> hopefully the formatting comes through)
> >>>
> >>>
> >>>
> >>>> 4:Project(C5633_14509=[$4], C5633_486=[$8])
> >>>
> >>>
> >>>
> >>>> └── 3:Join(condition=[=($1, $6)], joinType=[inner])
> >>>
> >>>
> >>>
> >>>> ....├── 1:Filter(condition=[<($2, 10)])
> >>>
> >>>
> >>>
> >>>> ....│...└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> >>>
> >>>
> >>>
> >>>> ....└── 2:TableScan(table=[T895], Schema=[...64 fields...])
> >>>
> >>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>> The result of RelFieldTrimmer is this:
> >>>
> >>>
> >>>
> >>>> 9:Project(C5633_14509=[$2], C5633_486=[$4])
> >>>
> >>>
> >>>
> >>>> └── 8:Join(condition=[=($0, $3)], joinType=[inner])
> >>>
> >>>
> >>>
> >>>> ....├── 6:Filter(condition=[<($1, 10)])
> >>>
> >>>
> >>>
> >>>> ....│...└── 5:Project(C5633_14505=[$1], C5633_14506=[$2],
> >> C5633_14509=[$4])
> >>>
> >>>
> >>>
> >>>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> >>>
> >>>
> >>>
> >>>> ....└── 7:Project(ID=[$0], C5633_486=[$2])
> >>>
> >>>
> >>>
> >>>> ........└── 2:TableScan(table=[T895], Schema=[...64 fields...])
> >>>
> >>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>> Notice: $1 on the LHS of the node is not used *after* the filter so a
> >>>
> >>>
> >>>
> >>>> projection of only the $0 and $2 fields would be reduce the width of
> the
> >>>
> >>>
> >>>
> >>>> row before the join.
> >>>
> >>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>> However, I can force the insertion of a projection which is simply the
> >>>
> >>>
> >>>
> >>>> identity (ie, projecting all fields of the input row with now
> additions
> >> or
> >>>
> >>>
> >>>
> >>>> subtractions):
> >>>
> >>>
> >>>
> >>>> 5:Project(C5633_14509=[$4], C5633_486=[$8])
> >>>
> >>>
> >>>
> >>>> └── 4:Join(condition=[=($1, $6)], joinType=[inner])
> >>>
> >>>
> >>>
> >>>> ....├── 2:Project(...Identity mapping, 6 fields...)
> >>>
> >>>
> >>>
> >>>> ....│...└── 1:Filter(condition=[<($2, 10)])
> >>>
> >>>
> >>>
> >>>> ....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> >>>
> >>>
> >>>
> >>>> ....└── 3:TableScan(table=[T895], Schema=[...64 fields...])
> >>>
> >>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>> And the result is a projection wich only has the 2 fields necessary
> >> after
> >>>
> >>>
> >>>
> >>>> the filter.
> >>>
> >>>
> >>>
> >>>> 11:Project(C5633_14509=[$1], C5633_486=[$3])
> >>>
> >>>
> >>>
> >>>> └── 10:Join(condition=[=($0, $2)], joinType=[inner])
> >>>
> >>>
> >>>
> >>>> ....├── 8:Project(C5633_14505=[$0], C5633_14509=[$2]) <- trimmed
> >>>
> >>>
> >>>
> >>>> ....│...└── 7:Filter(condition=[<($1, 10)])
> >>>
> >>>
> >>>
> >>>> ....│.......└── 6:Project(C5633_14505=[$1], C5633_14506=[$2],
> >>>
> >>>
> >>>
> >>>> C5633_14509=[$4])
> >>>
> >>>
> >>>
> >>>> ....│...........└── 0:TableScan(table=[T902], Schema=[...6 fields...])
> >>>
> >>>
> >>>
> >>>> ....└── 9:Project(ID=[$0], C5633_486=[$2])
> >>>
> >>>
> >>>
> >>>> ........└── 3:TableScan(table=[T895], Schema=[...64 fields...])
> >>>
> >>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>> Thanks!
> >>>
> >>>
> >>>
> >>>> -Ian
> >>>
> >>>
> >>>
> >>>>
> >>
> >>
>
>

Re: RelFieldTrimmer not optimally trimming after filters under joins?

Reply via email to