To keep it backwards compatible, introduce another API *applyAggregates *with
*List<ReferenceExpression> *when nested field support is added and
deprecate the current API. This will by default throw an exception. In
flink planner, *applyAggregates *with nested fields and if it throws
exception then *applyAggregates* without nested fields.

Regards
Venkata krishnan


On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan <
vsowr...@asu.edu> wrote:

> Jark,
>
> How about having a separate NestedFieldReferenceExpression, and
>> abstracting a common base class "ReferenceExpression" for
>> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
>> unifying expressions in
>> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
>> ...)"
>> possible.
>
> This should be fine for *SupportsProjectionPushDown* and
> *SupportsFilterPushDown*. One concern in the case of
> *SupportsAggregatePushDown* with nested fields support (to be added in
> the future), with this proposal, the API will become backwards incompatible
> as the *args *for the aggregate function is *List<FieldReferenceExpression>
> *that needs to change to *List<ReferenceExpression>*.
>
> Regards
> Venkata krishnan
>
>
> On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> wrote:
>
>> Hi Becket,
>>
>> I think it is the second case, that a FieldReferenceExpression is
>> constructed
>> by the framework and passed to the connector (interfaces listed by
>> Venkata[1]
>> and Catalog#listPartitionsByFilter). Besides, understanding the nested
>> field
>> is optional for users/connectors (just treat it as an unknown expression
>> if
>> the
>> connector doesn't want to support it).
>>
>> If we extend FieldReferenceExpression, in the case of "where col.nested >
>> 10",
>> for the connectors already supported filter/delete pushdown, they may
>> wrongly
>> pushdown "col > 10" instead of "nested > 10" because they still treat
>> FieldReferenceExpression as a top-level column. This problem can be
>> resolved
>> by introducing an additional "supportedNestedPushdown" for each interface,
>> but that method is not elegant and is hard to remove in the future, and
>> this could
>> be avoided if we have a separate NestedFieldReferenceExpression.
>>
>> If we want to extend FieldReferenceExpression, we have to add protections
>> for every related API in one shot. Besides, FieldReferenceExpression is a
>> fundamental class in the planner, we have to go through all the code that
>> is using it to make sure it properly handling it if it is a nested field
>> which
>> is a big effort for the community.
>>
>> If we were designing this API on day 1, I fully support merging them in a
>> FieldReferenceExpression. But in this case, I'm thinking about how to
>> provide
>> users with a smooth migration path, and allow the community to gradually
>> put efforts into evolving the API, and not block the "Nested Fields Filter
>> Pushdown"
>> requirement.
>>
>> How about having a separate NestedFieldReferenceExpression, and
>> abstracting a common base class "ReferenceExpression" for
>> NestedFieldReferenceExpression and FieldReferenceExpression? This makes
>> unifying expressions in
>> "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression>
>> ...)"
>> possible.
>>
>> Best,
>> Jark
>>
>> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan <
>> vsowr...@asu.edu>
>> wrote:
>>
>> > Becket and Jark,
>> >
>> >  Deprecate all the other
>> > > methods except tryApplyFilters() and tryApplyProjections().
>> >
>> > For *SupportsProjectionPushDown*, we still need a
>> > *supportsNestedProjections* API on the table source as some of the table
>> > sources might not be able to handle nested fields and therefore the
>> Flink
>> > planner should not push down the nested projections or else the
>> > *applyProjection
>> > *API has to be appropriately changed to return
>> > *unconvertibleProjections *similar
>> > to *SupportsFilterPushDown*.
>> >
>> > Or we have to introduce two different applyProjections()
>> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression
>> > > respectively.
>> >
>> > Agree this is not preferred. Given that *supportNestedProjections
>> *cannot
>> > be deprecated/removed based on the current API form, extending
>> > *FieldReferenceExpression* to support nested fields should be okay.
>> >
>> > Another alternative could be to change *applyProjections *to take
>> > List<ResolvedExpression> and on the connector side they choose to handle
>> > *FieldReferenceExpression* and *NestedFieldReferenceExpression *as
>> > applicable and return the remainingProjections. In the case of nested
>> field
>> > projections not supported, it should return them back but only
>> projecting
>> > the top level fields. IMO, this is also *not preferred*.
>> >
>> > *SupportsAggregatePushDown*
>> >
>> > *AggregateExpression *currently takes in a list of
>> > *FieldReferenceExpression* as args for the aggregate function, if in
>> future
>> > *SupportsAggregatePushDown* adds support for aggregate pushdown on
>> nested
>> > fields then the AggregateExpression API also has to change if a new
>> > NestedFieldReferenceExpression is introduced for nested fields.
>> >
>> > If we add a
>> > > flag for each new filter,
>> > > the interface will be filled with lots of flags (e.g.,
>> supportsBetween,
>> > > supportsIN)
>> >
>> > In an ideal situation, I completely agree with you. But in the current
>> > state, *supportsNestedFilters* can act as a bridge to reach the eventual
>> > desired state which is to have a clean and consistent set of APIs
>> > throughout all Supports*PushDown.
>> >
>> > Also shared some thoughts on the end state API
>> > <
>> >
>> https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$
>> > >
>> > with extension to the *FieldReferenceExpression* to support nested
>> fields.
>> > Please take a look.
>> >
>> > Regards
>> > Venkata krishnan
>> >
>> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com>
>> wrote:
>> >
>> > > Hi Jark,
>> > >
>> > > Regarding the migration path, it would be useful to scrutinize the use
>> > case
>> > > of FiledReferenceExpression and ResolvedExpressions. There are two
>> kinds
>> > of
>> > > use cases:
>> > >
>> > > 1. A ResolvedExpression is constructed by the user or connector /
>> plugin
>> > > developers.
>> > > 2. A ResolvedExpression is constructed by the framework and passed to
>> > user
>> > > or connector / plugin developers.
>> > >
>> > > For the first case, both of the approaches provide the same migration
>> > > experience.
>> > >
>> > > For the second case, generally speaking, introducing
>> > > NestedFieldReferenceExpression and extending FieldReferenceExpression
>> > would
>> > > have the same impact for backwards compatibility.
>> SupportsFilterPushDown
>> > is
>> > > a special case here because understanding the filter expressions is
>> > > optional for the source implementation. In other use cases, if
>> > > understanding the reference to a nested field is a must have, the user
>> > code
>> > > has to be changed, regardless of which approach we take to support
>> nested
>> > > fields.
>> > >
>> > > Therefore, I think we have to check each public API where the nested
>> > field
>> > > reference is exposed. If we have many public APIs where understanding
>> > > nested fields is optional for the user  / plugin / connector
>> developers,
>> > > having a separate NestedFieldReferenceExpression would have a more
>> smooth
>> > > migration. Otherwise, there seems to be no difference between the two
>> > > approaches.
>> > >
>> > > Migration path aside, the main reason I prefer extending
>> > > FieldReferenceExpression over a new NestedFieldReferenceExpression is
>> > > because this makes the SupportsProjectionPushDown interface simpler.
>> > > Otherwise, we have to treat it as a special case that does not match
>> the
>> > > overall API style. Or we have to introduce two different
>> > applyProjections()
>> > > methods for FieldReferenceExpression / NestedFieldReferenceExpression
>> > > respectively. This issue further extends to implementation in
>> addition to
>> > > public API. A single FieldReferenceExpression might help simplify the
>> > > implementation code a little bit. For example, in a recursive
>> processing
>> > of
>> > > a row with nested rows, we may not need to switch between
>> > > FieldReferenceExpression and NestedFieldReferenceExpression depending
>> on
>> > > whether the record being processed is a top level record or nested
>> > record.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com> wrote:
>> > >
>> > > > Hi Becket,
>> > > >
>> > > > I totally agree we should try to have a consistent API for a final
>> > state.
>> > > > The only concern I have mentioned is the "smooth" migration path.
>> > > > The FiledReferenceExpression is widely used in many public APIs,
>> > > > not only in the SupportsFilterPushDown. Yes, we can change every
>> > > > methods in 2-steps, but is it good to change API back and forth for
>> > this?
>> > > > Personally, I'm fine with a separate NestedFieldReferenceExpression
>> > > class.
>> > > > TBH, I prefer the separated way because it makes the reference
>> > expression
>> > > > more clear and concise.
>> > > >
>> > > > Best,
>> > > > Jark
>> > > >
>> > > >
>> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com>
>> wrote:
>> > > >
>> > > > > Thanks for the reply, Jark.
>> > > > >
>> > > > > I think it will be helpful to understand the final state we want
>> to
>> > > > > eventually achieve first, then we can discuss the steps towards
>> that
>> > > > final
>> > > > > state.
>> > > > >
>> > > > > It looks like there are two proposed end states now:
>> > > > >
>> > > > > 1. Have a separate NestedFieldReferenceExpression class; keep
>> > > > > SupportsFilterPushDown and SupportsProjectionPushDown the same.
>> It is
>> > > > just
>> > > > > a one step change.
>> > > > >    - Regarding the supportsNestedFilterPushDown() method, if our
>> > > contract
>> > > > > with the connector developer today is "The implementation should
>> > ignore
>> > > > > unrecognized expressions by putting them into the remaining
>> filters,
>> > > > > instead of throwing exceptions". Then there is no need for this
>> > > method. I
>> > > > > am not sure about the current contract. We should probably make it
>> > > clear
>> > > > in
>> > > > > the interface Java doc.
>> > > > >
>> > > > > 2. Extend the existing FiledReferenceExpression class to support
>> > nested
>> > > > > fields; SupportsFilterPushDown only has one method of
>> > > > > applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown
>> > only
>> > > > has
>> > > > > one method of applyProjections(List<FieldReferenceExpression>,
>> > > DataType).
>> > > > > It could just be two steps if we are not too obsessed with the
>> exact
>> > > > names
>> > > > > of "applyFilters" and "applyProjections". More specifically, it
>> takes
>> > > two
>> > > > > steps to achieve this final state:
>> > > > >     a. introduce a new method
>> > tryApplyFilters(List<ResolvedExpression>)
>> > > > to
>> > > > > SupportsFilterPushDown, which may have FiledReferenceExpression
>> with
>> > > > nested
>> > > > > fields. The default implementation throws an exception. The
>> runtime
>> > > will
>> > > > > first call tryApplyFilters() with nested fields. In case of
>> > exception,
>> > > it
>> > > > > calls the existing applyFilters() without including the nested
>> > filters.
>> > > > > Similarly, in SupportsProjectionPushDown, introduce a
>> > > > > tryApplyProjections<List<NestedFieldReference> method returning a
>> > > Result.
>> > > > > The Result also contains the accepted and unapplicable
>> projections.
>> > The
>> > > > > default implementation also throws an exception. Deprecate all the
>> > > other
>> > > > > methods except tryApplyFilters() and tryApplyProjections().
>> > > > >     b. remove the deprecated methods in the next major version
>> bump.
>> > > > >
>> > > > > Now the question is putting the migration steps aside, which end
>> > state
>> > > do
>> > > > > we prefer? While the first end state is acceptable for me,
>> > personally,
>> > > I
>> > > > > prefer the latter if we are designing from scratch. It is clean,
>> > > > consistent
>> > > > > and intuitive. Given the size of Flink, keeping APIs in the same
>> > style
>> > > > over
>> > > > > time is important. The migration is also not that complicated.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > >
>> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote:
>> > > > >
>> > > > > > Hi Venkat,
>> > > > > >
>> > > > > > Thanks for the proposal.
>> > > > > >
>> > > > > > I have some minor comments about the FLIP.
>> > > > > >
>> > > > > > 1. I think we don't need to
>> > > > > > add SupportsFilterPushDown#supportsNestedFilters() method,
>> > > > > > because connectors can skip nested filters by putting them in
>> > > > > > Result#remainingFilters().
>> > > > > > And this is backward-compatible because unknown expressions were
>> > > added
>> > > > to
>> > > > > > the remaining filters.
>> > > > > > Planner should push predicate expressions as more as possible.
>> If
>> > we
>> > > > add
>> > > > > a
>> > > > > > flag for each new filter,
>> > > > > > the interface will be filled with lots of flags (e.g.,
>> > > supportsBetween,
>> > > > > > supportsIN).
>> > > > > >
>> > > > > > 2. NestedFieldReferenceExpression#nestedFieldName should be an
>> > array
>> > > of
>> > > > > > field names?
>> > > > > > Each string represents a field name part of the field path. Just
>> > keep
>> > > > > > aligning with `nestedFieldIndexArray`.
>> > > > > >
>> > > > > > 3. My concern about making FieldReferenceExpression support
>> nested
>> > > > fields
>> > > > > > is the compatibility.
>> > > > > > It is a public API and users/connectors are already using it.
>> > People
>> > > > > > assumed it is a top-level column
>> > > > > > reference, and applied logic on it. But that's not true now and
>> > this
>> > > > may
>> > > > > > lead to unexpected errors.
>> > > > > > Having a separate NestedFieldReferenceExpression sounds safer to
>> > me.
>> > > > > Mixing
>> > > > > > them in a class may
>> > > > > >  confuse users what's the meaning of getFieldName() and
>> > > > getFieldIndex().
>> > > > > >
>> > > > > >
>> > > > > > Regarding using NestedFieldReferenceExpression in
>> > > > > > SupportsProjectionPushDown, do you
>> > > > > > have any concerns @Timo Walther <twal...@apache.org> ?
>> > > > > >
>> > > > > > Best,
>> > > > > > Jark
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <
>> > > > > vsowr...@asu.edu
>> > > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Sounds like a great suggestion, Becket. +1. Agree with
>> cleaning
>> > up
>> > > > the
>> > > > > > APIs
>> > > > > > > and making it consistent in all the pushdown APIs.
>> > > > > > >
>> > > > > > > Your suggested approach seems fine to me, unless anyone else
>> has
>> > > any
>> > > > > > other
>> > > > > > > concerns. Just have couple of clarifying questions:
>> > > > > > >
>> > > > > > > 1. Do you think we should standardize the APIs across all the
>> > > > pushdown
>> > > > > > > supports like SupportsPartitionPushdown,
>> SupportsDynamicFiltering
>> > > etc
>> > > > > in
>> > > > > > > the end state?
>> > > > > > >
>> > > > > > > The current proposal works if we do not want to migrate
>> > > > > > > > SupportsFilterPushdown to also use
>> > NestedFieldReferenceExpression
>> > > > in
>> > > > > > the
>> > > > > > > > long term.
>> > > > > > > >
>> > > > > > > Did you mean *FieldReferenceExpression* instead of
>> > > > > > > *NestedFieldReferenceExpression*?
>> > > > > > >
>> > > > > > > 2. Extend the FieldReferenceExpression to support nested
>> fields.
>> > > > > > > >     - Change the index field type from int to int[].
>> > > > > > >
>> > > > > > >     - Add a new method int[] getFieldIndexArray().
>> > > > > > > >     - Deprecate the int getFieldIndex() method, the code
>> will
>> > be
>> > > > > > removed
>> > > > > > > in
>> > > > > > > > the next major version bump.
>> > > > > > >
>> > > > > > > I assume getFieldIndex would return fieldIndexArray[0], right?
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Venkat
>> > > > > > >
>> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <
>> becket....@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Thanks for the proposal, Venkata.
>> > > > > > > >
>> > > > > > > > The current proposal works if we do not want to migrate
>> > > > > > > > SupportsFilterPushdown to also use
>> > NestedFieldReferenceExpression
>> > > > in
>> > > > > > the
>> > > > > > > > long term.
>> > > > > > > >
>> > > > > > > Did you mean *FieldReferenceExpression* instead of
>> > > > > > > *NestedFieldReferenceExpression*?
>> > > > > > >
>> > > > > > > >
>> > > > > > > > Otherwise, the alternative solution briefly mentioned in the
>> > > > rejected
>> > > > > > > > alternatives would be the following:
>> > > > > > > > Phase 1:
>> > > > > > > > 1. Introduce a supportsNestedFilters() method to the
>> > > > > > > SupportsFilterPushdown
>> > > > > > > > interface. (same as current proposal).
>> > > > > > > > 2. Extend the FieldReferenceExpression to support nested
>> > fields.
>> > > > > > > >     - Change the index field type from int to int[].
>> > > > > > >
>> > > > > > >     - Add a new method int[] getFieldIndexArray().
>> > > > > > > >     - Deprecate the int getFieldIndex() method, the code
>> will
>> > be
>> > > > > > removed
>> > > > > > > in
>> > > > > > > > the next major version bump.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > 3. In the SupportsProjectionPushDown interface
>> > > > > > > >     - add a new method
>> > > > > applyProjection(List<FieldReferenceExpression>,
>> > > > > > > > DataType), with default implementation invoking
>> > > > > > applyProjection(int[][],
>> > > > > > > > DataType)
>> > > > > > > >     - deprecate the current applyProjection(int[][],
>> DataType)
>> > > > method
>> > > > > > > >
>> > > > > > > > Phase 2 (in the next major version bump)
>> > > > > > > > 1. remove the deprecated methods.
>> > > > > > > >
>> > > > > > > > Phase 3 (optional)
>> > > > > > > > 1. deprecate and remove the supportsNestedFilters() /
>> > > > > > > > supportsNestedProjection() methods from the
>> > > SupportsFilterPushDown
>> > > > /
>> > > > > > > > SupportsProjectionPushDown interfaces.
>> > > > > > > >
>> > > > > > > > Personally I prefer this alternative. It takes longer to
>> finish
>> > > the
>> > > > > > work,
>> > > > > > > > but the API eventually becomes clean and consistent. But I
>> can
>> > > live
>> > > > > > with
>> > > > > > > > the current proposal.
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > >
>> > > > > > > > Jiangjie (Becket) Qin
>> > > > > > > >
>> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan
>> <
>> > > > > > > > vsowr...@asu.edu> wrote:
>> > > > > > > >
>> > > > > > > > > Gentle ping for reviews/feedback.
>> > > > > > > > >
>> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan <
>> > > > > > > > vsowr...@asu.edu
>> > > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi All,
>> > > > > > > > > >
>> > > > > > > > > > I am opening this thread to discuss FLIP-356: Support
>> > Nested
>> > > > > Fields
>> > > > > > > > > > Filter Pushdown. The FLIP can be found at
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
>> > > > > > > > > >
>> > > > > > > > > > This FLIP adds support for pushing down nested fields
>> > filters
>> > > > to
>> > > > > > the
>> > > > > > > > > > underlying TableSource. In our data lake, we find a lot
>> of
>> > > > > datasets
>> > > > > > > > have
>> > > > > > > > > > nested fields and also user queries with filters
>> defined on
>> > > the
>> > > > > > > nested
>> > > > > > > > > > fields. This would drastically improve the performance
>> for
>> > > > those
>> > > > > > sets
>> > > > > > > > of
>> > > > > > > > > > queries.
>> > > > > > > > > >
>> > > > > > > > > > Appreciate any comments or feedback you may have on this
>> > > > > proposal.
>> > > > > > > > > >
>> > > > > > > > > > Regards
>> > > > > > > > > > Venkata krishnan
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to