Becket and Jark,

 Deprecate all the other
> methods except tryApplyFilters() and tryApplyProjections().

For *SupportsProjectionPushDown*, we still need a
*supportsNestedProjections* API on the table source as some of the table
sources might not be able to handle nested fields and therefore the Flink
planner should not push down the nested projections or else the
*applyProjection
*API has to be appropriately changed to return
*unconvertibleProjections *similar
to *SupportsFilterPushDown*.

Or we have to introduce two different applyProjections()
> methods for FieldReferenceExpression / NestedFieldReferenceExpression
> respectively.

Agree this is not preferred. Given that *supportNestedProjections *cannot
be deprecated/removed based on the current API form, extending
*FieldReferenceExpression* to support nested fields should be okay.

Another alternative could be to change *applyProjections *to take
List<ResolvedExpression> and on the connector side they choose to handle
*FieldReferenceExpression* and *NestedFieldReferenceExpression *as
applicable and return the remainingProjections. In the case of nested field
projections not supported, it should return them back but only projecting
the top level fields. IMO, this is also *not preferred*.

*SupportsAggregatePushDown*

*AggregateExpression *currently takes in a list of
*FieldReferenceExpression* as args for the aggregate function, if in future
*SupportsAggregatePushDown* adds support for aggregate pushdown on nested
fields then the AggregateExpression API also has to change if a new
NestedFieldReferenceExpression is introduced for nested fields.

If we add a
> flag for each new filter,
> the interface will be filled with lots of flags (e.g., supportsBetween,
> supportsIN)

In an ideal situation, I completely agree with you. But in the current
state, *supportsNestedFilters* can act as a bridge to reach the eventual
desired state which is to have a clean and consistent set of APIs
throughout all Supports*PushDown.

Also shared some thoughts on the end state API
<https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing>
with extension to the *FieldReferenceExpression* to support nested fields.
Please take a look.

Regards
Venkata krishnan

On Tue, Aug 22, 2023 at 5:02 PM Becket Qin <becket....@gmail.com> wrote:

> Hi Jark,
>
> Regarding the migration path, it would be useful to scrutinize the use case
> of FiledReferenceExpression and ResolvedExpressions. There are two kinds of
> use cases:
>
> 1. A ResolvedExpression is constructed by the user or connector / plugin
> developers.
> 2. A ResolvedExpression is constructed by the framework and passed to user
> or connector / plugin developers.
>
> For the first case, both of the approaches provide the same migration
> experience.
>
> For the second case, generally speaking, introducing
> NestedFieldReferenceExpression and extending FieldReferenceExpression would
> have the same impact for backwards compatibility. SupportsFilterPushDown is
> a special case here because understanding the filter expressions is
> optional for the source implementation. In other use cases, if
> understanding the reference to a nested field is a must have, the user code
> has to be changed, regardless of which approach we take to support nested
> fields.
>
> Therefore, I think we have to check each public API where the nested field
> reference is exposed. If we have many public APIs where understanding
> nested fields is optional for the user  / plugin / connector developers,
> having a separate NestedFieldReferenceExpression would have a more smooth
> migration. Otherwise, there seems to be no difference between the two
> approaches.
>
> Migration path aside, the main reason I prefer extending
> FieldReferenceExpression over a new NestedFieldReferenceExpression is
> because this makes the SupportsProjectionPushDown interface simpler.
> Otherwise, we have to treat it as a special case that does not match the
> overall API style. Or we have to introduce two different applyProjections()
> methods for FieldReferenceExpression / NestedFieldReferenceExpression
> respectively. This issue further extends to implementation in addition to
> public API. A single FieldReferenceExpression might help simplify the
> implementation code a little bit. For example, in a recursive processing of
> a row with nested rows, we may not need to switch between
> FieldReferenceExpression and NestedFieldReferenceExpression depending on
> whether the record being processed is a top level record or nested record.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Tue, Aug 22, 2023 at 11:43 PM Jark Wu <imj...@gmail.com> wrote:
>
> > Hi Becket,
> >
> > I totally agree we should try to have a consistent API for a final state.
> > The only concern I have mentioned is the "smooth" migration path.
> > The FiledReferenceExpression is widely used in many public APIs,
> > not only in the SupportsFilterPushDown. Yes, we can change every
> > methods in 2-steps, but is it good to change API back and forth for this?
> > Personally, I'm fine with a separate NestedFieldReferenceExpression
> class.
> > TBH, I prefer the separated way because it makes the reference expression
> > more clear and concise.
> >
> > Best,
> > Jark
> >
> >
> > On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com> wrote:
> >
> > > Thanks for the reply, Jark.
> > >
> > > I think it will be helpful to understand the final state we want to
> > > eventually achieve first, then we can discuss the steps towards that
> > final
> > > state.
> > >
> > > It looks like there are two proposed end states now:
> > >
> > > 1. Have a separate NestedFieldReferenceExpression class; keep
> > > SupportsFilterPushDown and SupportsProjectionPushDown the same. It is
> > just
> > > a one step change.
> > >    - Regarding the supportsNestedFilterPushDown() method, if our
> contract
> > > with the connector developer today is "The implementation should ignore
> > > unrecognized expressions by putting them into the remaining filters,
> > > instead of throwing exceptions". Then there is no need for this
> method. I
> > > am not sure about the current contract. We should probably make it
> clear
> > in
> > > the interface Java doc.
> > >
> > > 2. Extend the existing FiledReferenceExpression class to support nested
> > > fields; SupportsFilterPushDown only has one method of
> > > applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown only
> > has
> > > one method of applyProjections(List<FieldReferenceExpression>,
> DataType).
> > > It could just be two steps if we are not too obsessed with the exact
> > names
> > > of "applyFilters" and "applyProjections". More specifically, it takes
> two
> > > steps to achieve this final state:
> > >     a. introduce a new method tryApplyFilters(List<ResolvedExpression>)
> > to
> > > SupportsFilterPushDown, which may have FiledReferenceExpression with
> > nested
> > > fields. The default implementation throws an exception. The runtime
> will
> > > first call tryApplyFilters() with nested fields. In case of exception,
> it
> > > calls the existing applyFilters() without including the nested filters.
> > > Similarly, in SupportsProjectionPushDown, introduce a
> > > tryApplyProjections<List<NestedFieldReference> method returning a
> Result.
> > > The Result also contains the accepted and unapplicable projections. The
> > > default implementation also throws an exception. Deprecate all the
> other
> > > methods except tryApplyFilters() and tryApplyProjections().
> > >     b. remove the deprecated methods in the next major version bump.
> > >
> > > Now the question is putting the migration steps aside, which end state
> do
> > > we prefer? While the first end state is acceptable for me, personally,
> I
> > > prefer the latter if we are designing from scratch. It is clean,
> > consistent
> > > and intuitive. Given the size of Flink, keeping APIs in the same style
> > over
> > > time is important. The migration is also not that complicated.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote:
> > >
> > > > Hi Venkat,
> > > >
> > > > Thanks for the proposal.
> > > >
> > > > I have some minor comments about the FLIP.
> > > >
> > > > 1. I think we don't need to
> > > > add SupportsFilterPushDown#supportsNestedFilters() method,
> > > > because connectors can skip nested filters by putting them in
> > > > Result#remainingFilters().
> > > > And this is backward-compatible because unknown expressions were
> added
> > to
> > > > the remaining filters.
> > > > Planner should push predicate expressions as more as possible. If we
> > add
> > > a
> > > > flag for each new filter,
> > > > the interface will be filled with lots of flags (e.g.,
> supportsBetween,
> > > > supportsIN).
> > > >
> > > > 2. NestedFieldReferenceExpression#nestedFieldName should be an array
> of
> > > > field names?
> > > > Each string represents a field name part of the field path. Just keep
> > > > aligning with `nestedFieldIndexArray`.
> > > >
> > > > 3. My concern about making FieldReferenceExpression support nested
> > fields
> > > > is the compatibility.
> > > > It is a public API and users/connectors are already using it. People
> > > > assumed it is a top-level column
> > > > reference, and applied logic on it. But that's not true now and this
> > may
> > > > lead to unexpected errors.
> > > > Having a separate NestedFieldReferenceExpression sounds safer to me.
> > > Mixing
> > > > them in a class may
> > > >  confuse users what's the meaning of getFieldName() and
> > getFieldIndex().
> > > >
> > > >
> > > > Regarding using NestedFieldReferenceExpression in
> > > > SupportsProjectionPushDown, do you
> > > > have any concerns @Timo Walther <twal...@apache.org> ?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >
> > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <
> > > vsowr...@asu.edu
> > > > >
> > > > wrote:
> > > >
> > > > > Sounds like a great suggestion, Becket. +1. Agree with cleaning up
> > the
> > > > APIs
> > > > > and making it consistent in all the pushdown APIs.
> > > > >
> > > > > Your suggested approach seems fine to me, unless anyone else has
> any
> > > > other
> > > > > concerns. Just have couple of clarifying questions:
> > > > >
> > > > > 1. Do you think we should standardize the APIs across all the
> > pushdown
> > > > > supports like SupportsPartitionPushdown, SupportsDynamicFiltering
> etc
> > > in
> > > > > the end state?
> > > > >
> > > > > The current proposal works if we do not want to migrate
> > > > > > SupportsFilterPushdown to also use NestedFieldReferenceExpression
> > in
> > > > the
> > > > > > long term.
> > > > > >
> > > > > Did you mean *FieldReferenceExpression* instead of
> > > > > *NestedFieldReferenceExpression*?
> > > > >
> > > > > 2. Extend the FieldReferenceExpression to support nested fields.
> > > > > >     - Change the index field type from int to int[].
> > > > >
> > > > >     - Add a new method int[] getFieldIndexArray().
> > > > > >     - Deprecate the int getFieldIndex() method, the code will be
> > > > removed
> > > > > in
> > > > > > the next major version bump.
> > > > >
> > > > > I assume getFieldIndex would return fieldIndexArray[0], right?
> > > > >
> > > > > Thanks
> > > > > Venkat
> > > > >
> > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <becket....@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks for the proposal, Venkata.
> > > > > >
> > > > > > The current proposal works if we do not want to migrate
> > > > > > SupportsFilterPushdown to also use NestedFieldReferenceExpression
> > in
> > > > the
> > > > > > long term.
> > > > > >
> > > > > Did you mean *FieldReferenceExpression* instead of
> > > > > *NestedFieldReferenceExpression*?
> > > > >
> > > > > >
> > > > > > Otherwise, the alternative solution briefly mentioned in the
> > rejected
> > > > > > alternatives would be the following:
> > > > > > Phase 1:
> > > > > > 1. Introduce a supportsNestedFilters() method to the
> > > > > SupportsFilterPushdown
> > > > > > interface. (same as current proposal).
> > > > > > 2. Extend the FieldReferenceExpression to support nested fields.
> > > > > >     - Change the index field type from int to int[].
> > > > >
> > > > >     - Add a new method int[] getFieldIndexArray().
> > > > > >     - Deprecate the int getFieldIndex() method, the code will be
> > > > removed
> > > > > in
> > > > > > the next major version bump.
> > > > >
> > > > >
> > > > >
> > > > > 3. In the SupportsProjectionPushDown interface
> > > > > >     - add a new method
> > > applyProjection(List<FieldReferenceExpression>,
> > > > > > DataType), with default implementation invoking
> > > > applyProjection(int[][],
> > > > > > DataType)
> > > > > >     - deprecate the current applyProjection(int[][], DataType)
> > method
> > > > > >
> > > > > > Phase 2 (in the next major version bump)
> > > > > > 1. remove the deprecated methods.
> > > > > >
> > > > > > Phase 3 (optional)
> > > > > > 1. deprecate and remove the supportsNestedFilters() /
> > > > > > supportsNestedProjection() methods from the
> SupportsFilterPushDown
> > /
> > > > > > SupportsProjectionPushDown interfaces.
> > > > > >
> > > > > > Personally I prefer this alternative. It takes longer to finish
> the
> > > > work,
> > > > > > but the API eventually becomes clean and consistent. But I can
> live
> > > > with
> > > > > > the current proposal.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan <
> > > > > > vsowr...@asu.edu> wrote:
> > > > > >
> > > > > > > Gentle ping for reviews/feedback.
> > > > > > >
> > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan <
> > > > > > vsowr...@asu.edu
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I am opening this thread to discuss FLIP-356: Support Nested
> > > Fields
> > > > > > > > Filter Pushdown. The FLIP can be found at
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > > > > > >
> > > > > > > > This FLIP adds support for pushing down nested fields filters
> > to
> > > > the
> > > > > > > > underlying TableSource. In our data lake, we find a lot of
> > > datasets
> > > > > > have
> > > > > > > > nested fields and also user queries with filters defined on
> the
> > > > > nested
> > > > > > > > fields. This would drastically improve the performance for
> > those
> > > > sets
> > > > > > of
> > > > > > > > queries.
> > > > > > > >
> > > > > > > > Appreciate any comments or feedback you may have on this
> > > proposal.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Venkata krishnan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to