Hi Venkata,

Your summary looks good to me. +1 to start a vote.

I think we don't need "inputIndex" in NestedFieldReferenceExpression.
Actually, I think it is also not needed in FieldReferenceExpression,
and we should try to remove it (another topic). The RexInputRef in Calcite
also doesn't require an inputIndex because the field index should represent
index of the field in the underlying row type. Field references shouldn't be
 aware of the number of inputs.

Best,
Jark


On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
wrote:

> Hi Jinsong,
>
> Thanks for your comments.
>
> What is inputIndex in NestedFieldReferenceExpression?
>
>
> I haven't looked at it before. Do you mean, given that it is now only used
> to push filters it won't be subsequently used in further
> planning/optimization and therefore it is not required at this time?
>
> So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > a need to introduce a base class `ReferenceExpression`?
>
> For SupportsFilterPushDown itself, *ReferenceExpression* base class is not
> needed. But there were discussions around cleaning up and standardizing the
> API for Supports*PushDown. SupportsProjectionPushDown currently pushes the
> projects as a 2-d array, instead it would be better to use the standard API
> which seems to be the *ResolvedExpression*. For SupportsProjectionPushDown
> either FieldReferenceExpression (top level fields) or
> NestedFieldReferenceExpression (nested fields) is enough, in order to
> provide a single API that handles both top level and nested fields,
> ReferenceExpression will be introduced as a base class.
>
> Eventually, *SupportsProjectionPushDown#applyProjections* would evolve as
> applyProjection(List<ReferenceExpression> projectedFields) and nested
> fields would be pushed only if *supportsNestedProjections* returns true.
>
> Regards
> Venkata krishnan
>
>
> On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li <jingsongl...@gmail.com>
> wrote:
>
> > So if NestedFieldReferenceExpression doesn't need inputIndex, is there
> > a need to introduce a base class `ReferenceExpression`?
> >
> > Best,
> > Jingsong
> >
> > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li <jingsongl...@gmail.com>
> > wrote:
> > >
> > > Hi thanks all for your discussion.
> > >
> > > What is inputIndex in NestedFieldReferenceExpression?
> > >
> > > I know inputIndex has special usage in FieldReferenceExpression, but
> > > it is only for Join operators, and it is only for SQL optimization. It
> > > looks like there is no requirement for Nested.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan
> > > <vsowr...@asu.edu> wrote:
> > > >
> > > > Thanks for all the feedback and discussion everyone. Looks like we
> have
> > > > reached a consensus here.
> > > >
> > > > Just to summarize:
> > > >
> > > > 1. Introduce a new *ReferenceExpression* (or
> *BaseReferenceExpression*)
> > > > abstract class which will be extended by both
> > *FieldReferenceExpression*
> > > > and *NestedFieldReferenceExpression* (to be introduced as part of
> this
> > FLIP)
> > > > 2. No need of *supportsNestedFilters *check as the current
> > > > *SupportsFilterPushDown* should already ignore unknown expressions (
> > > > *NestedFieldReferenceExpression* for example) and return them as
> > > > *remainingFilters.
> > > > *Maybe this should be clarified explicitly in the Javadoc of
> > > > *SupportsFilterPushDown.
> > > > *I will file a separate JIRA to fix the documentation.
> > > > 3. Refactor *SupportsProjectionPushDown* to use *ReferenceExpression
> > *instead
> > > > of existing 2-d arrays to consolidate and be consistent with other
> > > > Supports*PushDown APIs - *outside the scope of this FLIP*
> > > > 4. Similarly *SupportsAggregatePushDown* should also be evolved
> > whenever
> > > > nested fields support is added to use the *ReferenceExpression -
> > **outside
> > > > the scope of this FLIP*
> > > >
> > > > Does this sound good? Please let me know if I have missed anything
> > here. If
> > > > there are no concerns, I will start a vote tomorrow. I will also get
> > the
> > > > FLIP-356 wiki updated. Thanks everyone once again!
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > > >
> > > > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin <becket....@gmail.com>
> > wrote:
> > > >
> > > > > Hi Jark,
> > > > >
> > > > > How about having a separate NestedFieldReferenceExpression, and
> > > > > > abstracting a common base class "ReferenceExpression" for
> > > > > > NestedFieldReferenceExpression and FieldReferenceExpression? This
> > makes
> > > > > > unifying expressions in
> > > > > >
> > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> > > > > > ...)"
> > > > > > possible.
> > > > >
> > > > >
> > > > > I'd be fine with this. It at least provides a consistent API style
> /
> > > > > formality.
> > > > >
> > > > >  Re: Yunhong,
> > > > >
> > > > > 3. Finally, I think we need to look at the costs and benefits of
> > unifying
> > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or
> > others)
> > > > > from
> > > > > > the perspective of interface implementers. A stable API can
> reduce
> > user
> > > > > > development and change costs, if the current API can fully meet
> the
> > > > > > functional requirements at the framework level, I personal
> suggest
> > > > > reducing
> > > > > > the impact on connector developers.
> > > > > >
> > > > >
> > > > > I agree that the cost and benefit should be measured. And the
> > measurement
> > > > > should be in the long term instead of short term. That is why we
> > always
> > > > > need to align on the ideal end state first.
> > > > > Meeting functionality requirements is the bare minimum bar for an
> > API.
> > > > > Simplicity, intuitiveness, robustness and evolvability are also
> > important.
> > > > > In addition, for projects with many APIs, such as Flink, a
> > consistent API
> > > > > style is also critical for the user adoption as well as bug
> > avoidance. It
> > > > > is very helpful for the community to agree on some API design
> > conventions /
> > > > > principles.
> > > > > For example, in this particular case, via our discussion, hopefully
> > we sort
> > > > > of established the following API design conventions / principles
> for
> > all
> > > > > the Supports*PushDown interfaces.
> > > > >
> > > > > 1. By default, expressions should be used if applicable instead of
> > other
> > > > > representations.
> > > > > 2. In general, the pushdown method should not assume all the
> > pushdowns will
> > > > > succeed. So the applyX() method should return a boolean or List<X>,
> > to
> > > > > handle the cases that some of the pushdowns cannot be fulfilled by
> > the
> > > > > implementation.
> > > > >
> > > > > Establishing such conventions and principles demands careful
> > thinking for
> > > > > the aspects I mentioned earlier in addition to the API
> > functionalities.
> > > > > This helps lower the bar of understanding, reduces the chance of
> > having
> > > > > loose ends in the API, and will benefit all the participants in the
> > project
> > > > > over time. I think this is the right way to achieve real API
> > stability.
> > > > > Otherwise, we may end up chasing our tails to find ways not to
> > change the
> > > > > existing non-ideal APIs.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Fri, Aug 25, 2023 at 9:33 AM yh z <zhengyunhon...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi, Venkat,
> > > > > >
> > > > > > Thanks for the FLIP, it sounds good to support nested fields
> filter
> > > > > > pushdown. Based on the design of flip and the above options, I
> > would like
> > > > > > to make a few suggestions:
> > > > > >
> > > > > > 1.  At present, introducing NestedFieldReferenceExpression looks
> > like a
> > > > > > better solution, which can fully meet our requirements while
> > reducing
> > > > > > modifications to base class FieldReferenceExpression. In the long
> > run, I
> > > > > > tend to abstract a basic class for NestedFieldReferenceExpression
> > and
> > > > > > FieldReferenceExpression as u suggested.
> > > > > >
> > > > > > 2. Personally, I don't recommend introducing
> > *supportsNestedFilters() in
> > > > > > supportsFilterPushdown. We just need to better declare the return
> > value
> > > > > of
> > > > > > the method *applyFilters.
> > > > > >
> > > > > > 3. Finally, I think we need to look at the costs and benefits of
> > unifying
> > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or
> > others)
> > > > > from
> > > > > > the perspective of interface implementers. A stable API can
> reduce
> > user
> > > > > > development and change costs, if the current API can fully meet
> the
> > > > > > functional requirements at the framework level, I personal
> suggest
> > > > > reducing
> > > > > > the impact on connector developers.
> > > > > >
> > > > > > Regards,
> > > > > > Yunhong Zheng (Swuferhong)
> > > > > >
> > > > > >
> > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月25日周五
> > 01:25写道:
> > > > > >
> > > > > > > To keep it backwards compatible, introduce another API
> > *applyAggregates
> > > > > > > *with
> > > > > > > *List<ReferenceExpression> *when nested field support is added
> > and
> > > > > > > deprecate the current API. This will by default throw an
> > exception. In
> > > > > > > flink planner, *applyAggregates *with nested fields and if it
> > throws
> > > > > > > exception then *applyAggregates* without nested fields.
> > > > > > >
> > > > > > > Regards
> > > > > > > Venkata krishnan
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
> > > > > > > vsowr...@asu.edu> wrote:
> > > > > > >
> > > > > > > > Jark,
> > > > > > > >
> > > > > > > > How about having a separate NestedFieldReferenceExpression,
> and
> > > > > > > >> abstracting a common base class "ReferenceExpression" for
> > > > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression?
> > This
> > > > > > makes
> > > > > > > >> unifying expressions in
> > > > > > > >>
> > > > >
> > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> > > > > > > >> ...)"
> > > > > > > >> possible.
> > > > > > > >
> > > > > > > > This should be fine for *SupportsProjectionPushDown* and
> > > > > > > > *SupportsFilterPushDown*. One concern in the case of
> > > > > > > > *SupportsAggregatePushDown* with nested fields support (to be
> > added
> > > > > in
> > > > > > > > the future), with this proposal, the API will become
> backwards
> > > > > > > incompatible
> > > > > > > > as the *args *for the aggregate function is
> > > > > > > *List<FieldReferenceExpression>
> > > > > > > > *that needs to change to *List<ReferenceExpression>*.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Venkata krishnan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > >> Hi Becket,
> > > > > > > >>
> > > > > > > >> I think it is the second case, that a
> > FieldReferenceExpression is
> > > > > > > >> constructed
> > > > > > > >> by the framework and passed to the connector (interfaces
> > listed by
> > > > > > > >> Venkata[1]
> > > > > > > >> and Catalog#listPartitionsByFilter). Besides, understanding
> > the
> > > > > nested
> > > > > > > >> field
> > > > > > > >> is optional for users/connectors (just treat it as an
> unknown
> > > > > > expression
> > > > > > > >> if
> > > > > > > >> the
> > > > > > > >> connector doesn't want to support it).
> > > > > > > >>
> > > > > > > >> If we extend FieldReferenceExpression, in the case of "where
> > > > > > col.nested
> > > > > > > >
> > > > > > > >> 10",
> > > > > > > >> for the connectors already supported filter/delete pushdown,
> > they
> > > > > may
> > > > > > > >> wrongly
> > > > > > > >> pushdown "col > 10" instead of "nested > 10" because they
> > still
> > > > > treat
> > > > > > > >> FieldReferenceExpression as a top-level column. This problem
> > can be
> > > > > > > >> resolved
> > > > > > > >> by introducing an additional "supportedNestedPushdown" for
> > each
> > > > > > > interface,
> > > > > > > >> but that method is not elegant and is hard to remove in the
> > future,
> > > > > > and
> > > > > > > >> this could
> > > > > > > >> be avoided if we have a separate
> > NestedFieldReferenceExpression.
> > > > > > > >>
> > > > > > > >> If we want to extend FieldReferenceExpression, we have to
> add
> > > > > > > protections
> > > > > > > >> for every related API in one shot. Besides,
> > FieldReferenceExpression
> > > > > > is
> > > > > > > a
> > > > > > > >> fundamental class in the planner, we have to go through all
> > the code
> > > > > > > that
> > > > > > > >> is using it to make sure it properly handling it if it is a
> > nested
> > > > > > field
> > > > > > > >> which
> > > > > > > >> is a big effort for the community.
> > > > > > > >>
> > > > > > > >> If we were designing this API on day 1, I fully support
> > merging them
> > > > > > in
> > > > > > > a
> > > > > > > >> FieldReferenceExpression. But in this case, I'm thinking
> > about how
> > > > > to
> > > > > > > >> provide
> > > > > > > >> users with a smooth migration path, and allow the community
> to
> > > > > > gradually
> > > > > > > >> put efforts into evolving the API, and not block the "Nested
> > Fields
> > > > > > > Filter
> > > > > > > >> Pushdown"
> > > > > > > >> requirement.
> > > > > > > >>
> > > > > > > >> How about having a separate NestedFieldReferenceExpression,
> > and
> > > > > > > >> abstracting a common base class "ReferenceExpression" for
> > > > > > > >> NestedFieldReferenceExpression and FieldReferenceExpression?
> > This
> > > > > > makes
> > > > > > > >> unifying expressions in
> > > > > > > >>
> > > > >
> > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> > > > > > > >> ...)"
> > > > > > > >> possible.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Jark
> > > > > > > >>
> > > > > > > >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <
> > > > > > > >> vsowr...@asu.edu>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Becket and Jark,
> > > > > > > >> >
> > > > > > > >> >  Deprecate all the other
> > > > > > > >> > > methods except tryApplyFilters() and
> > tryApplyProjections().
> > > > > > > >> >
> > > > > > > >> > For *SupportsProjectionPushDown*, we still need a
> > > > > > > >> > *supportsNestedProjections* API on the table source as
> some
> > of the
> > > > > > > table
> > > > > > > >> > sources might not be able to handle nested fields and
> > therefore
> > > > > the
> > > > > > > >> Flink
> > > > > > > >> > planner should not push down the nested projections or
> else
> > the
> > > > > > > >> > *applyProjection
> > > > > > > >> > *API has to be appropriately changed to return
> > > > > > > >> > *unconvertibleProjections *similar
> > > > > > > >> > to *SupportsFilterPushDown*.
> > > > > > > >> >
> > > > > > > >> > Or we have to introduce two different applyProjections()
> > > > > > > >> > > methods for FieldReferenceExpression /
> > > > > > > NestedFieldReferenceExpression
> > > > > > > >> > > respectively.
> > > > > > > >> >
> > > > > > > >> > Agree this is not preferred. Given that
> > *supportNestedProjections
> > > > > > > >> *cannot
> > > > > > > >> > be deprecated/removed based on the current API form,
> > extending
> > > > > > > >> > *FieldReferenceExpression* to support nested fields should
> > be
> > > > > okay.
> > > > > > > >> >
> > > > > > > >> > Another alternative could be to change *applyProjections
> > *to take
> > > > > > > >> > List<ResolvedExpression> and on the connector side they
> > choose to
> > > > > > > handle
> > > > > > > >> > *FieldReferenceExpression* and
> > *NestedFieldReferenceExpression *as
> > > > > > > >> > applicable and return the remainingProjections. In the
> case
> > of
> > > > > > nested
> > > > > > > >> field
> > > > > > > >> > projections not supported, it should return them back but
> > only
> > > > > > > >> projecting
> > > > > > > >> > the top level fields. IMO, this is also *not preferred*.
> > > > > > > >> >
> > > > > > > >> > *SupportsAggregatePushDown*
> > > > > > > >> >
> > > > > > > >> > *AggregateExpression *currently takes in a list of
> > > > > > > >> > *FieldReferenceExpression* as args for the aggregate
> > function, if
> > > > > in
> > > > > > > >> future
> > > > > > > >> > *SupportsAggregatePushDown* adds support for aggregate
> > pushdown on
> > > > > > > >> nested
> > > > > > > >> > fields then the AggregateExpression API also has to change
> > if a
> > > > > new
> > > > > > > >> > NestedFieldReferenceExpression is introduced for nested
> > fields.
> > > > > > > >> >
> > > > > > > >> > If we add a
> > > > > > > >> > > flag for each new filter,
> > > > > > > >> > > the interface will be filled with lots of flags (e.g.,
> > > > > > > >> supportsBetween,
> > > > > > > >> > > supportsIN)
> > > > > > > >> >
> > > > > > > >> > In an ideal situation, I completely agree with you. But in
> > the
> > > > > > current
> > > > > > > >> > state, *supportsNestedFilters* can act as a bridge to
> reach
> > the
> > > > > > > eventual
> > > > > > > >> > desired state which is to have a clean and consistent set
> > of APIs
> > > > > > > >> > throughout all Supports*PushDown.
> > > > > > > >> >
> > > > > > > >> > Also shared some thoughts on the end state API
> > > > > > > >> > <
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> >
> https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$
> > > > > > > >> > >
> > > > > > > >> > with extension to the *FieldReferenceExpression* to
> support
> > nested
> > > > > > > >> fields.
> > > > > > > >> > Please take a look.
> > > > > > > >> >
> > > > > > > >> > Regards
> > > > > > > >> > Venkata krishnan
> > > > > > > >> >
> > > > > > > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <
> > becket....@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi Jark,
> > > > > > > >> > >
> > > > > > > >> > > Regarding the migration path, it would be useful to
> > scrutinize
> > > > > the
> > > > > > > use
> > > > > > > >> > case
> > > > > > > >> > > of FiledReferenceExpression and ResolvedExpressions.
> > There are
> > > > > two
> > > > > > > >> kinds
> > > > > > > >> > of
> > > > > > > >> > > use cases:
> > > > > > > >> > >
> > > > > > > >> > > 1. A ResolvedExpression is constructed by the user or
> > connector
> > > > > /
> > > > > > > >> plugin
> > > > > > > >> > > developers.
> > > > > > > >> > > 2. A ResolvedExpression is constructed by the framework
> > and
> > > > > passed
> > > > > > > to
> > > > > > > >> > user
> > > > > > > >> > > or connector / plugin developers.
> > > > > > > >> > >
> > > > > > > >> > > For the first case, both of the approaches provide the
> > same
> > > > > > > migration
> > > > > > > >> > > experience.
> > > > > > > >> > >
> > > > > > > >> > > For the second case, generally speaking, introducing
> > > > > > > >> > > NestedFieldReferenceExpression and extending
> > > > > > > FieldReferenceExpression
> > > > > > > >> > would
> > > > > > > >> > > have the same impact for backwards compatibility.
> > > > > > > >> SupportsFilterPushDown
> > > > > > > >> > is
> > > > > > > >> > > a special case here because understanding the filter
> > expressions
> > > > > > is
> > > > > > > >> > > optional for the source implementation. In other use
> > cases, if
> > > > > > > >> > > understanding the reference to a nested field is a must
> > have,
> > > > > the
> > > > > > > user
> > > > > > > >> > code
> > > > > > > >> > > has to be changed, regardless of which approach we take
> to
> > > > > support
> > > > > > > >> nested
> > > > > > > >> > > fields.
> > > > > > > >> > >
> > > > > > > >> > > Therefore, I think we have to check each public API
> where
> > the
> > > > > > nested
> > > > > > > >> > field
> > > > > > > >> > > reference is exposed. If we have many public APIs where
> > > > > > > understanding
> > > > > > > >> > > nested fields is optional for the user  / plugin /
> > connector
> > > > > > > >> developers,
> > > > > > > >> > > having a separate NestedFieldReferenceExpression would
> > have a
> > > > > more
> > > > > > > >> smooth
> > > > > > > >> > > migration. Otherwise, there seems to be no difference
> > between
> > > > > the
> > > > > > > two
> > > > > > > >> > > approaches.
> > > > > > > >> > >
> > > > > > > >> > > Migration path aside, the main reason I prefer extending
> > > > > > > >> > > FieldReferenceExpression over a new
> > > > > NestedFieldReferenceExpression
> > > > > > > is
> > > > > > > >> > > because this makes the SupportsProjectionPushDown
> > interface
> > > > > > simpler.
> > > > > > > >> > > Otherwise, we have to treat it as a special case that
> > does not
> > > > > > match
> > > > > > > >> the
> > > > > > > >> > > overall API style. Or we have to introduce two different
> > > > > > > >> > applyProjections()
> > > > > > > >> > > methods for FieldReferenceExpression /
> > > > > > > NestedFieldReferenceExpression
> > > > > > > >> > > respectively. This issue further extends to
> > implementation in
> > > > > > > >> addition to
> > > > > > > >> > > public API. A single FieldReferenceExpression might help
> > > > > simplify
> > > > > > > the
> > > > > > > >> > > implementation code a little bit. For example, in a
> > recursive
> > > > > > > >> processing
> > > > > > > >> > of
> > > > > > > >> > > a row with nested rows, we may not need to switch
> between
> > > > > > > >> > > FieldReferenceExpression and
> > NestedFieldReferenceExpression
> > > > > > > depending
> > > > > > > >> on
> > > > > > > >> > > whether the record being processed is a top level record
> > or
> > > > > nested
> > > > > > > >> > record.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > >
> > > > > > > >> > > Jiangjie (Becket) Qin
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <
> > imj...@gmail.com>
> > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi Becket,
> > > > > > > >> > > >
> > > > > > > >> > > > I totally agree we should try to have a consistent API
> > for a
> > > > > > final
> > > > > > > >> > state.
> > > > > > > >> > > > The only concern I have mentioned is the "smooth"
> > migration
> > > > > > path.
> > > > > > > >> > > > The FiledReferenceExpression is widely used in many
> > public
> > > > > APIs,
> > > > > > > >> > > > not only in the SupportsFilterPushDown. Yes, we can
> > change
> > > > > every
> > > > > > > >> > > > methods in 2-steps, but is it good to change API back
> > and
> > > > > forth
> > > > > > > for
> > > > > > > >> > this?
> > > > > > > >> > > > Personally, I'm fine with a separate
> > > > > > > NestedFieldReferenceExpression
> > > > > > > >> > > class.
> > > > > > > >> > > > TBH, I prefer the separated way because it makes the
> > reference
> > > > > > > >> > expression
> > > > > > > >> > > > more clear and concise.
> > > > > > > >> > > >
> > > > > > > >> > > > Best,
> > > > > > > >> > > > Jark
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin <
> > > > > becket....@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Thanks for the reply, Jark.
> > > > > > > >> > > > >
> > > > > > > >> > > > > I think it will be helpful to understand the final
> > state we
> > > > > > want
> > > > > > > >> to
> > > > > > > >> > > > > eventually achieve first, then we can discuss the
> > steps
> > > > > > towards
> > > > > > > >> that
> > > > > > > >> > > > final
> > > > > > > >> > > > > state.
> > > > > > > >> > > > >
> > > > > > > >> > > > > It looks like there are two proposed end states now:
> > > > > > > >> > > > >
> > > > > > > >> > > > > 1. Have a separate NestedFieldReferenceExpression
> > class;
> > > > > keep
> > > > > > > >> > > > > SupportsFilterPushDown and
> SupportsProjectionPushDown
> > the
> > > > > > same.
> > > > > > > >> It is
> > > > > > > >> > > > just
> > > > > > > >> > > > > a one step change.
> > > > > > > >> > > > >    - Regarding the supportsNestedFilterPushDown()
> > method, if
> > > > > > our
> > > > > > > >> > > contract
> > > > > > > >> > > > > with the connector developer today is "The
> > implementation
> > > > > > should
> > > > > > > >> > ignore
> > > > > > > >> > > > > unrecognized expressions by putting them into the
> > remaining
> > > > > > > >> filters,
> > > > > > > >> > > > > instead of throwing exceptions". Then there is no
> > need for
> > > > > > this
> > > > > > > >> > > method. I
> > > > > > > >> > > > > am not sure about the current contract. We should
> > probably
> > > > > > make
> > > > > > > it
> > > > > > > >> > > clear
> > > > > > > >> > > > in
> > > > > > > >> > > > > the interface Java doc.
> > > > > > > >> > > > >
> > > > > > > >> > > > > 2. Extend the existing FiledReferenceExpression
> class
> > to
> > > > > > support
> > > > > > > >> > nested
> > > > > > > >> > > > > fields; SupportsFilterPushDown only has one method
> of
> > > > > > > >> > > > > applyFilters(List<ResolvedExpression>);
> > > > > > > SupportsProjectionPushDown
> > > > > > > >> > only
> > > > > > > >> > > > has
> > > > > > > >> > > > > one method of
> > > > > applyProjections(List<FieldReferenceExpression>,
> > > > > > > >> > > DataType).
> > > > > > > >> > > > > It could just be two steps if we are not too
> obsessed
> > with
> > > > > the
> > > > > > > >> exact
> > > > > > > >> > > > names
> > > > > > > >> > > > > of "applyFilters" and "applyProjections". More
> > specifically,
> > > > > > it
> > > > > > > >> takes
> > > > > > > >> > > two
> > > > > > > >> > > > > steps to achieve this final state:
> > > > > > > >> > > > >     a. introduce a new method
> > > > > > > >> > tryApplyFilters(List<ResolvedExpression>)
> > > > > > > >> > > > to
> > > > > > > >> > > > > SupportsFilterPushDown, which may have
> > > > > > FiledReferenceExpression
> > > > > > > >> with
> > > > > > > >> > > > nested
> > > > > > > >> > > > > fields. The default implementation throws an
> > exception. The
> > > > > > > >> runtime
> > > > > > > >> > > will
> > > > > > > >> > > > > first call tryApplyFilters() with nested fields. In
> > case of
> > > > > > > >> > exception,
> > > > > > > >> > > it
> > > > > > > >> > > > > calls the existing applyFilters() without including
> > the
> > > > > nested
> > > > > > > >> > filters.
> > > > > > > >> > > > > Similarly, in SupportsProjectionPushDown, introduce
> a
> > > > > > > >> > > > > tryApplyProjections<List<NestedFieldReference>
> method
> > > > > > returning
> > > > > > > a
> > > > > > > >> > > Result.
> > > > > > > >> > > > > The Result also contains the accepted and
> unapplicable
> > > > > > > >> projections.
> > > > > > > >> > The
> > > > > > > >> > > > > default implementation also throws an exception.
> > Deprecate
> > > > > all
> > > > > > > the
> > > > > > > >> > > other
> > > > > > > >> > > > > methods except tryApplyFilters() and
> > tryApplyProjections().
> > > > > > > >> > > > >     b. remove the deprecated methods in the next
> major
> > > > > version
> > > > > > > >> bump.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Now the question is putting the migration steps
> > aside, which
> > > > > > end
> > > > > > > >> > state
> > > > > > > >> > > do
> > > > > > > >> > > > > we prefer? While the first end state is acceptable
> > for me,
> > > > > > > >> > personally,
> > > > > > > >> > > I
> > > > > > > >> > > > > prefer the latter if we are designing from scratch.
> > It is
> > > > > > clean,
> > > > > > > >> > > > consistent
> > > > > > > >> > > > > and intuitive. Given the size of Flink, keeping APIs
> > in the
> > > > > > same
> > > > > > > >> > style
> > > > > > > >> > > > over
> > > > > > > >> > > > > time is important. The migration is also not that
> > > > > complicated.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Thanks,
> > > > > > > >> > > > >
> > > > > > > >> > > > > Jiangjie (Becket) Qin
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <
> > imj...@gmail.com>
> > > > > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Hi Venkat,
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Thanks for the proposal.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > I have some minor comments about the FLIP.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > 1. I think we don't need to
> > > > > > > >> > > > > > add SupportsFilterPushDown#supportsNestedFilters()
> > method,
> > > > > > > >> > > > > > because connectors can skip nested filters by
> > putting them
> > > > > > in
> > > > > > > >> > > > > > Result#remainingFilters().
> > > > > > > >> > > > > > And this is backward-compatible because unknown
> > > > > expressions
> > > > > > > were
> > > > > > > >> > > added
> > > > > > > >> > > > to
> > > > > > > >> > > > > > the remaining filters.
> > > > > > > >> > > > > > Planner should push predicate expressions as more
> as
> > > > > > possible.
> > > > > > > >> If
> > > > > > > >> > we
> > > > > > > >> > > > add
> > > > > > > >> > > > > a
> > > > > > > >> > > > > > flag for each new filter,
> > > > > > > >> > > > > > the interface will be filled with lots of flags
> > (e.g.,
> > > > > > > >> > > supportsBetween,
> > > > > > > >> > > > > > supportsIN).
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName
> > should
> > > > > be
> > > > > > an
> > > > > > > >> > array
> > > > > > > >> > > of
> > > > > > > >> > > > > > field names?
> > > > > > > >> > > > > > Each string represents a field name part of the
> > field
> > > > > path.
> > > > > > > Just
> > > > > > > >> > keep
> > > > > > > >> > > > > > aligning with `nestedFieldIndexArray`.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > 3. My concern about making
> FieldReferenceExpression
> > > > > support
> > > > > > > >> nested
> > > > > > > >> > > > fields
> > > > > > > >> > > > > > is the compatibility.
> > > > > > > >> > > > > > It is a public API and users/connectors are
> already
> > using
> > > > > > it.
> > > > > > > >> > People
> > > > > > > >> > > > > > assumed it is a top-level column
> > > > > > > >> > > > > > reference, and applied logic on it. But that's not
> > true
> > > > > now
> > > > > > > and
> > > > > > > >> > this
> > > > > > > >> > > > may
> > > > > > > >> > > > > > lead to unexpected errors.
> > > > > > > >> > > > > > Having a separate NestedFieldReferenceExpression
> > sounds
> > > > > > safer
> > > > > > > to
> > > > > > > >> > me.
> > > > > > > >> > > > > Mixing
> > > > > > > >> > > > > > them in a class may
> > > > > > > >> > > > > >  confuse users what's the meaning of
> getFieldName()
> > and
> > > > > > > >> > > > getFieldIndex().
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Regarding using NestedFieldReferenceExpression in
> > > > > > > >> > > > > > SupportsProjectionPushDown, do you
> > > > > > > >> > > > > > have any concerns @Timo Walther <
> twal...@apache.org>
> > ?
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Best,
> > > > > > > >> > > > > > Jark
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan
> > Sowrirajan <
> > > > > > > >> > > > > vsowr...@asu.edu
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > wrote:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > > Sounds like a great suggestion, Becket. +1.
> Agree
> > with
> > > > > > > >> cleaning
> > > > > > > >> > up
> > > > > > > >> > > > the
> > > > > > > >> > > > > > APIs
> > > > > > > >> > > > > > > and making it consistent in all the pushdown
> APIs.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Your suggested approach seems fine to me, unless
> > anyone
> > > > > > else
> > > > > > > >> has
> > > > > > > >> > > any
> > > > > > > >> > > > > > other
> > > > > > > >> > > > > > > concerns. Just have couple of clarifying
> > questions:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > 1. Do you think we should standardize the APIs
> > across
> > > > > all
> > > > > > > the
> > > > > > > >> > > > pushdown
> > > > > > > >> > > > > > > supports like SupportsPartitionPushdown,
> > > > > > > >> SupportsDynamicFiltering
> > > > > > > >> > > etc
> > > > > > > >> > > > > in
> > > > > > > >> > > > > > > the end state?
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > The current proposal works if we do not want to
> > migrate
> > > > > > > >> > > > > > > > SupportsFilterPushdown to also use
> > > > > > > >> > NestedFieldReferenceExpression
> > > > > > > >> > > > in
> > > > > > > >> > > > > > the
> > > > > > > >> > > > > > > > long term.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead
> of
> > > > > > > >> > > > > > > *NestedFieldReferenceExpression*?
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > 2. Extend the FieldReferenceExpression to
> support
> > nested
> > > > > > > >> fields.
> > > > > > > >> > > > > > > >     - Change the index field type from int to
> > int[].
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >     - Add a new method int[]
> getFieldIndexArray().
> > > > > > > >> > > > > > > >     - Deprecate the int getFieldIndex()
> method,
> > the
> > > > > code
> > > > > > > >> will
> > > > > > > >> > be
> > > > > > > >> > > > > > removed
> > > > > > > >> > > > > > > in
> > > > > > > >> > > > > > > > the next major version bump.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > I assume getFieldIndex would return
> > fieldIndexArray[0],
> > > > > > > right?
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Thanks
> > > > > > > >> > > > > > > Venkat
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <
> > > > > > > >> becket....@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > Thanks for the proposal, Venkata.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > The current proposal works if we do not want
> to
> > > > > migrate
> > > > > > > >> > > > > > > > SupportsFilterPushdown to also use
> > > > > > > >> > NestedFieldReferenceExpression
> > > > > > > >> > > > in
> > > > > > > >> > > > > > the
> > > > > > > >> > > > > > > > long term.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* instead
> of
> > > > > > > >> > > > > > > *NestedFieldReferenceExpression*?
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Otherwise, the alternative solution briefly
> > mentioned
> > > > > in
> > > > > > > the
> > > > > > > >> > > > rejected
> > > > > > > >> > > > > > > > alternatives would be the following:
> > > > > > > >> > > > > > > > Phase 1:
> > > > > > > >> > > > > > > > 1. Introduce a supportsNestedFilters() method
> > to the
> > > > > > > >> > > > > > > SupportsFilterPushdown
> > > > > > > >> > > > > > > > interface. (same as current proposal).
> > > > > > > >> > > > > > > > 2. Extend the FieldReferenceExpression to
> > support
> > > > > nested
> > > > > > > >> > fields.
> > > > > > > >> > > > > > > >     - Change the index field type from int to
> > int[].
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >     - Add a new method int[]
> getFieldIndexArray().
> > > > > > > >> > > > > > > >     - Deprecate the int getFieldIndex()
> method,
> > the
> > > > > code
> > > > > > > >> will
> > > > > > > >> > be
> > > > > > > >> > > > > > removed
> > > > > > > >> > > > > > > in
> > > > > > > >> > > > > > > > the next major version bump.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > 3. In the SupportsProjectionPushDown interface
> > > > > > > >> > > > > > > >     - add a new method
> > > > > > > >> > > > > applyProjection(List<FieldReferenceExpression>,
> > > > > > > >> > > > > > > > DataType), with default implementation
> invoking
> > > > > > > >> > > > > > applyProjection(int[][],
> > > > > > > >> > > > > > > > DataType)
> > > > > > > >> > > > > > > >     - deprecate the current
> > applyProjection(int[][],
> > > > > > > >> DataType)
> > > > > > > >> > > > method
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Phase 2 (in the next major version bump)
> > > > > > > >> > > > > > > > 1. remove the deprecated methods.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Phase 3 (optional)
> > > > > > > >> > > > > > > > 1. deprecate and remove the
> > supportsNestedFilters() /
> > > > > > > >> > > > > > > > supportsNestedProjection() methods from the
> > > > > > > >> > > SupportsFilterPushDown
> > > > > > > >> > > > /
> > > > > > > >> > > > > > > > SupportsProjectionPushDown interfaces.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Personally I prefer this alternative. It takes
> > longer
> > > > > to
> > > > > > > >> finish
> > > > > > > >> > > the
> > > > > > > >> > > > > > work,
> > > > > > > >> > > > > > > > but the API eventually becomes clean and
> > consistent.
> > > > > > But I
> > > > > > > >> can
> > > > > > > >> > > live
> > > > > > > >> > > > > > with
> > > > > > > >> > > > > > > > the current proposal.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Thanks,
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM
> Venkatakrishnan
> > > > > > > Sowrirajan
> > > > > > > >> <
> > > > > > > >> > > > > > > > vsowr...@asu.edu> wrote:
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > Gentle ping for reviews/feedback.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM
> Venkatakrishnan
> > > > > > > Sowrirajan <
> > > > > > > >> > > > > > > > vsowr...@asu.edu
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > Hi All,
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > I am opening this thread to discuss
> > FLIP-356:
> > > > > > Support
> > > > > > > >> > Nested
> > > > > > > >> > > > > Fields
> > > > > > > >> > > > > > > > > > Filter Pushdown. The FLIP can be found at
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > This FLIP adds support for pushing down
> > nested
> > > > > > fields
> > > > > > > >> > filters
> > > > > > > >> > > > to
> > > > > > > >> > > > > > the
> > > > > > > >> > > > > > > > > > underlying TableSource. In our data lake,
> > we find
> > > > > a
> > > > > > > lot
> > > > > > > >> of
> > > > > > > >> > > > > datasets
> > > > > > > >> > > > > > > > have
> > > > > > > >> > > > > > > > > > nested fields and also user queries with
> > filters
> > > > > > > >> defined on
> > > > > > > >> > > the
> > > > > > > >> > > > > > > nested
> > > > > > > >> > > > > > > > > > fields. This would drastically improve the
> > > > > > performance
> > > > > > > >> for
> > > > > > > >> > > > those
> > > > > > > >> > > > > > sets
> > > > > > > >> > > > > > > > of
> > > > > > > >> > > > > > > > > > queries.
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > Appreciate any comments or feedback you
> may
> > have
> > > > > on
> > > > > > > this
> > > > > > > >> > > > > proposal.
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > Regards
> > > > > > > >> > > > > > > > > > Venkata krishnan
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >
> >
>

Reply via email to