Jark,

How about having a separate NestedFieldReferenceExpression, and
> abstracting a common base class "ReferenceExpression" for
> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> unifying expressions in
> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> ...)"
> possible.

This should be fine for *SupportsProjectionPushDown* and
*SupportsFilterPushDown*. One concern in the case of
*SupportsAggregatePushDown* with nested fields support (to be added in the
future), with this proposal, the API will become backwards incompatible as
the *args *for the aggregate function is *List<FieldReferenceExpression> *that
needs to change to *List<ReferenceExpression>*.

Regards
Venkata krishnan


On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> wrote:

> Hi Becket,
>
> I think it is the second case, that a FieldReferenceExpression is
> constructed
> by the framework and passed to the connector (interfaces listed by
> Venkata[1]
> and Catalog#listPartitionsByFilter). Besides, understanding the nested
> field
> is optional for users/connectors (just treat it as an unknown expression if
> the
> connector doesn't want to support it).
>
> If we extend FieldReferenceExpression, in the case of "where col.nested >
> 10",
> for the connectors already supported filter/delete pushdown, they may
> wrongly
> pushdown "col > 10" instead of "nested > 10" because they still treat
> FieldReferenceExpression as a top-level column. This problem can be
> resolved
> by introducing an additional "supportedNestedPushdown" for each interface,
> but that method is not elegant and is hard to remove in the future, and
> this could
> be avoided if we have a separate NestedFieldReferenceExpression.
>
> If we want to extend FieldReferenceExpression, we have to add protections
> for every related API in one shot. Besides, FieldReferenceExpression is a
> fundamental class in the planner, we have to go through all the code that
> is using it to make sure it properly handling it if it is a nested field
> which
> is a big effort for the community.
>
> If we were designing this API on day 1, I fully support merging them in a
> FieldReferenceExpression. But in this case, I'm thinking about how to
> provide
> users with a smooth migration path, and allow the community to gradually
> put efforts into evolving the API, and not block the "Nested Fields Filter
> Pushdown"
> requirement.
>
> How about having a separate NestedFieldReferenceExpression, and
> abstracting a common base class "ReferenceExpression" for
> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
> unifying expressions in
> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
> ...)"
> possible.
>
> Best,
> Jark
>
> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <vsowr...@asu.edu
> >
> wrote:
>
> > Becket and Jark,
> >
> >  Deprecate all the other
> > > methods except tryApplyFilters() and tryApplyProjections().
> >
> > For *SupportsProjectionPushDown*, we still need a
> > *supportsNestedProjections* API on the table source as some of the table
> > sources might not be able to handle nested fields and therefore the Flink
> > planner should not push down the nested projections or else the
> > *applyProjection
> > *API has to be appropriately changed to return
> > *unconvertibleProjections *similar
> > to *SupportsFilterPushDown*.
> >
> > Or we have to introduce two different applyProjections()
> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression
> > > respectively.
> >
> > Agree this is not preferred. Given that *supportNestedProjections *cannot
> > be deprecated/removed based on the current API form, extending
> > *FieldReferenceExpression* to support nested fields should be okay.
> >
> > Another alternative could be to change *applyProjections *to take
> > List<ResolvedExpression> and on the connector side they choose to handle
> > *FieldReferenceExpression* and *NestedFieldReferenceExpression *as
> > applicable and return the remainingProjections. In the case of nested
> field
> > projections not supported, it should return them back but only projecting
> > the top level fields. IMO, this is also *not preferred*.
> >
> > *SupportsAggregatePushDown*
> >
> > *AggregateExpression *currently takes in a list of
> > *FieldReferenceExpression* as args for the aggregate function, if in
> future
> > *SupportsAggregatePushDown* adds support for aggregate pushdown on nested
> > fields then the AggregateExpression API also has to change if a new
> > NestedFieldReferenceExpression is introduced for nested fields.
> >
> > If we add a
> > > flag for each new filter,
> > > the interface will be filled with lots of flags (e.g., supportsBetween,
> > > supportsIN)
> >
> > In an ideal situation, I completely agree with you. But in the current
> > state, *supportsNestedFilters* can act as a bridge to reach the eventual
> > desired state which is to have a clean and consistent set of APIs
> > throughout all Supports*PushDown.
> >
> > Also shared some thoughts on the end state API
> > <
> >
> https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$
> > >
> > with extension to the *FieldReferenceExpression* to support nested
> fields.
> > Please take a look.
> >
> > Regards
> > Venkata krishnan
> >
> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com> wrote:
> >
> > > Hi Jark,
> > >
> > > Regarding the migration path, it would be useful to scrutinize the use
> > case
> > > of FiledReferenceExpression and ResolvedExpressions. There are two
> kinds
> > of
> > > use cases:
> > >
> > > 1. A ResolvedExpression is constructed by the user or connector /
> plugin
> > > developers.
> > > 2. A ResolvedExpression is constructed by the framework and passed to
> > user
> > > or connector / plugin developers.
> > >
> > > For the first case, both of the approaches provide the same migration
> > > experience.
> > >
> > > For the second case, generally speaking, introducing
> > > NestedFieldReferenceExpression and extending FieldReferenceExpression
> > would
> > > have the same impact for backwards compatibility.
> SupportsFilterPushDown
> > is
> > > a special case here because understanding the filter expressions is
> > > optional for the source implementation. In other use cases, if
> > > understanding the reference to a nested field is a must have, the user
> > code
> > > has to be changed, regardless of which approach we take to support
> nested
> > > fields.
> > >
> > > Therefore, I think we have to check each public API where the nested
> > field
> > > reference is exposed. If we have many public APIs where understanding
> > > nested fields is optional for the user  / plugin / connector
> developers,
> > > having a separate NestedFieldReferenceExpression would have a more
> smooth
> > > migration. Otherwise, there seems to be no difference between the two
> > > approaches.
> > >
> > > Migration path aside, the main reason I prefer extending
> > > FieldReferenceExpression over a new NestedFieldReferenceExpression is
> > > because this makes the SupportsProjectionPushDown interface simpler.
> > > Otherwise, we have to treat it as a special case that does not match
> the
> > > overall API style. Or we have to introduce two different
> > applyProjections()
> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression
> > > respectively. This issue further extends to implementation in addition
> to
> > > public API. A single FieldReferenceExpression might help simplify the
> > > implementation code a little bit. For example, in a recursive
> processing
> > of
> > > a row with nested rows, we may not need to switch between
> > > FieldReferenceExpression and NestedFieldReferenceExpression depending
> on
> > > whether the record being processed is a top level record or nested
> > record.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com> wrote:
> > >
> > > > Hi Becket,
> > > >
> > > > I totally agree we should try to have a consistent API for a final
> > state.
> > > > The only concern I have mentioned is the "smooth" migration path.
> > > > The FiledReferenceExpression is widely used in many public APIs,
> > > > not only in the SupportsFilterPushDown. Yes, we can change every
> > > > methods in 2-steps, but is it good to change API back and forth for
> > this?
> > > > Personally, I'm fine with a separate NestedFieldReferenceExpression
> > > class.
> > > > TBH, I prefer the separated way because it makes the reference
> > expression
> > > > more clear and concise.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com>
> wrote:
> > > >
> > > > > Thanks for the reply, Jark.
> > > > >
> > > > > I think it will be helpful to understand the final state we want to
> > > > > eventually achieve first, then we can discuss the steps towards
> that
> > > > final
> > > > > state.
> > > > >
> > > > > It looks like there are two proposed end states now:
> > > > >
> > > > > 1. Have a separate NestedFieldReferenceExpression class; keep
> > > > > SupportsFilterPushDown and SupportsProjectionPushDown the same. It
> is
> > > > just
> > > > > a one step change.
> > > > >    - Regarding the supportsNestedFilterPushDown() method, if our
> > > contract
> > > > > with the connector developer today is "The implementation should
> > ignore
> > > > > unrecognized expressions by putting them into the remaining
> filters,
> > > > > instead of throwing exceptions". Then there is no need for this
> > > method. I
> > > > > am not sure about the current contract. We should probably make it
> > > clear
> > > > in
> > > > > the interface Java doc.
> > > > >
> > > > > 2. Extend the existing FiledReferenceExpression class to support
> > nested
> > > > > fields; SupportsFilterPushDown only has one method of
> > > > > applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown
> > only
> > > > has
> > > > > one method of applyProjections(List<FieldReferenceExpression>,
> > > DataType).
> > > > > It could just be two steps if we are not too obsessed with the
> exact
> > > > names
> > > > > of "applyFilters" and "applyProjections". More specifically, it
> takes
> > > two
> > > > > steps to achieve this final state:
> > > > >     a. introduce a new method
> > tryApplyFilters(List<ResolvedExpression>)
> > > > to
> > > > > SupportsFilterPushDown, which may have FiledReferenceExpression
> with
> > > > nested
> > > > > fields. The default implementation throws an exception. The runtime
> > > will
> > > > > first call tryApplyFilters() with nested fields. In case of
> > exception,
> > > it
> > > > > calls the existing applyFilters() without including the nested
> > filters.
> > > > > Similarly, in SupportsProjectionPushDown, introduce a
> > > > > tryApplyProjections<List<NestedFieldReference> method returning a
> > > Result.
> > > > > The Result also contains the accepted and unapplicable projections.
> > The
> > > > > default implementation also throws an exception. Deprecate all the
> > > other
> > > > > methods except tryApplyFilters() and tryApplyProjections().
> > > > >     b. remove the deprecated methods in the next major version
> bump.
> > > > >
> > > > > Now the question is putting the migration steps aside, which end
> > state
> > > do
> > > > > we prefer? While the first end state is acceptable for me,
> > personally,
> > > I
> > > > > prefer the latter if we are designing from scratch. It is clean,
> > > > consistent
> > > > > and intuitive. Given the size of Flink, keeping APIs in the same
> > style
> > > > over
> > > > > time is important. The migration is also not that complicated.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > >
> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote:
> > > > >
> > > > > > Hi Venkat,
> > > > > >
> > > > > > Thanks for the proposal.
> > > > > >
> > > > > > I have some minor comments about the FLIP.
> > > > > >
> > > > > > 1. I think we don't need to
> > > > > > add SupportsFilterPushDown#supportsNestedFilters() method,
> > > > > > because connectors can skip nested filters by putting them in
> > > > > > Result#remainingFilters().
> > > > > > And this is backward-compatible because unknown expressions were
> > > added
> > > > to
> > > > > > the remaining filters.
> > > > > > Planner should push predicate expressions as more as possible. If
> > we
> > > > add
> > > > > a
> > > > > > flag for each new filter,
> > > > > > the interface will be filled with lots of flags (e.g.,
> > > supportsBetween,
> > > > > > supportsIN).
> > > > > >
> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName should be an
> > array
> > > of
> > > > > > field names?
> > > > > > Each string represents a field name part of the field path. Just
> > keep
> > > > > > aligning with `nestedFieldIndexArray`.
> > > > > >
> > > > > > 3. My concern about making FieldReferenceExpression support
> nested
> > > > fields
> > > > > > is the compatibility.
> > > > > > It is a public API and users/connectors are already using it.
> > People
> > > > > > assumed it is a top-level column
> > > > > > reference, and applied logic on it. But that's not true now and
> > this
> > > > may
> > > > > > lead to unexpected errors.
> > > > > > Having a separate NestedFieldReferenceExpression sounds safer to
> > me.
> > > > > Mixing
> > > > > > them in a class may
> > > > > >  confuse users what's the meaning of getFieldName() and
> > > > getFieldIndex().
> > > > > >
> > > > > >
> > > > > > Regarding using NestedFieldReferenceExpression in
> > > > > > SupportsProjectionPushDown, do you
> > > > > > have any concerns @Timo Walther <twal...@apache.org> ?
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <
> > > > > vsowr...@asu.edu
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Sounds like a great suggestion, Becket. +1. Agree with cleaning
> > up
> > > > the
> > > > > > APIs
> > > > > > > and making it consistent in all the pushdown APIs.
> > > > > > >
> > > > > > > Your suggested approach seems fine to me, unless anyone else
> has
> > > any
> > > > > > other
> > > > > > > concerns. Just have couple of clarifying questions:
> > > > > > >
> > > > > > > 1. Do you think we should standardize the APIs across all the
> > > > pushdown
> > > > > > > supports like SupportsPartitionPushdown,
> SupportsDynamicFiltering
> > > etc
> > > > > in
> > > > > > > the end state?
> > > > > > >
> > > > > > > The current proposal works if we do not want to migrate
> > > > > > > > SupportsFilterPushdown to also use
> > NestedFieldReferenceExpression
> > > > in
> > > > > > the
> > > > > > > > long term.
> > > > > > > >
> > > > > > > Did you mean *FieldReferenceExpression* instead of
> > > > > > > *NestedFieldReferenceExpression*?
> > > > > > >
> > > > > > > 2. Extend the FieldReferenceExpression to support nested
> fields.
> > > > > > > >     - Change the index field type from int to int[].
> > > > > > >
> > > > > > >     - Add a new method int[] getFieldIndexArray().
> > > > > > > >     - Deprecate the int getFieldIndex() method, the code will
> > be
> > > > > > removed
> > > > > > > in
> > > > > > > > the next major version bump.
> > > > > > >
> > > > > > > I assume getFieldIndex would return fieldIndexArray[0], right?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Venkat
> > > > > > >
> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <
> becket....@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the proposal, Venkata.
> > > > > > > >
> > > > > > > > The current proposal works if we do not want to migrate
> > > > > > > > SupportsFilterPushdown to also use
> > NestedFieldReferenceExpression
> > > > in
> > > > > > the
> > > > > > > > long term.
> > > > > > > >
> > > > > > > Did you mean *FieldReferenceExpression* instead of
> > > > > > > *NestedFieldReferenceExpression*?
> > > > > > >
> > > > > > > >
> > > > > > > > Otherwise, the alternative solution briefly mentioned in the
> > > > rejected
> > > > > > > > alternatives would be the following:
> > > > > > > > Phase 1:
> > > > > > > > 1. Introduce a supportsNestedFilters() method to the
> > > > > > > SupportsFilterPushdown
> > > > > > > > interface. (same as current proposal).
> > > > > > > > 2. Extend the FieldReferenceExpression to support nested
> > fields.
> > > > > > > >     - Change the index field type from int to int[].
> > > > > > >
> > > > > > >     - Add a new method int[] getFieldIndexArray().
> > > > > > > >     - Deprecate the int getFieldIndex() method, the code will
> > be
> > > > > > removed
> > > > > > > in
> > > > > > > > the next major version bump.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 3. In the SupportsProjectionPushDown interface
> > > > > > > >     - add a new method
> > > > > applyProjection(List<FieldReferenceExpression>,
> > > > > > > > DataType), with default implementation invoking
> > > > > > applyProjection(int[][],
> > > > > > > > DataType)
> > > > > > > >     - deprecate the current applyProjection(int[][],
> DataType)
> > > > method
> > > > > > > >
> > > > > > > > Phase 2 (in the next major version bump)
> > > > > > > > 1. remove the deprecated methods.
> > > > > > > >
> > > > > > > > Phase 3 (optional)
> > > > > > > > 1. deprecate and remove the supportsNestedFilters() /
> > > > > > > > supportsNestedProjection() methods from the
> > > SupportsFilterPushDown
> > > > /
> > > > > > > > SupportsProjectionPushDown interfaces.
> > > > > > > >
> > > > > > > > Personally I prefer this alternative. It takes longer to
> finish
> > > the
> > > > > > work,
> > > > > > > > but the API eventually becomes clean and consistent. But I
> can
> > > live
> > > > > > with
> > > > > > > > the current proposal.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan <
> > > > > > > > vsowr...@asu.edu> wrote:
> > > > > > > >
> > > > > > > > > Gentle ping for reviews/feedback.
> > > > > > > > >
> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan <
> > > > > > > > vsowr...@asu.edu
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > > I am opening this thread to discuss FLIP-356: Support
> > Nested
> > > > > Fields
> > > > > > > > > > Filter Pushdown. The FLIP can be found at
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > > > > > > > >
> > > > > > > > > > This FLIP adds support for pushing down nested fields
> > filters
> > > > to
> > > > > > the
> > > > > > > > > > underlying TableSource. In our data lake, we find a lot
> of
> > > > > datasets
> > > > > > > > have
> > > > > > > > > > nested fields and also user queries with filters defined
> on
> > > the
> > > > > > > nested
> > > > > > > > > > fields. This would drastically improve the performance
> for
> > > > those
> > > > > > sets
> > > > > > > > of
> > > > > > > > > > queries.
> > > > > > > > > >
> > > > > > > > > > Appreciate any comments or feedback you may have on this
> > > > > proposal.
> > > > > > > > > >
> > > > > > > > > > Regards
> > > > > > > > > > Venkata krishnan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to