Hi Becket,

I totally agree we should try to have a consistent API for a final state.
The only concern I have mentioned is the "smooth" migration path.
The FiledReferenceExpression is widely used in many public APIs,
not only in the SupportsFilterPushDown. Yes, we can change every
methods in 2-steps, but is it good to change API back and forth for this?
Personally, I'm fine with a separate NestedFieldReferenceExpression class.
TBH, I prefer the separated way because it makes the reference expression
more clear and concise.

Best,
Jark


On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com> wrote:

> Thanks for the reply, Jark.
>
> I think it will be helpful to understand the final state we want to
> eventually achieve first, then we can discuss the steps towards that final
> state.
>
> It looks like there are two proposed end states now:
>
> 1. Have a separate NestedFieldReferenceExpression class; keep
> SupportsFilterPushDown and SupportsProjectionPushDown the same. It is just
> a one step change.
>    - Regarding the supportsNestedFilterPushDown() method, if our contract
> with the connector developer today is "The implementation should ignore
> unrecognized expressions by putting them into the remaining filters,
> instead of throwing exceptions". Then there is no need for this method. I
> am not sure about the current contract. We should probably make it clear in
> the interface Java doc.
>
> 2. Extend the existing FiledReferenceExpression class to support nested
> fields; SupportsFilterPushDown only has one method of
> applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown only has
> one method of applyProjections(List<FieldReferenceExpression>, DataType).
> It could just be two steps if we are not too obsessed with the exact names
> of "applyFilters" and "applyProjections". More specifically, it takes two
> steps to achieve this final state:
>     a. introduce a new method tryApplyFilters(List<ResolvedExpression>) to
> SupportsFilterPushDown, which may have FiledReferenceExpression with nested
> fields. The default implementation throws an exception. The runtime will
> first call tryApplyFilters() with nested fields. In case of exception, it
> calls the existing applyFilters() without including the nested filters.
> Similarly, in SupportsProjectionPushDown, introduce a
> tryApplyProjections<List<NestedFieldReference> method returning a Result.
> The Result also contains the accepted and unapplicable projections. The
> default implementation also throws an exception. Deprecate all the other
> methods except tryApplyFilters() and tryApplyProjections().
>     b. remove the deprecated methods in the next major version bump.
>
> Now the question is putting the migration steps aside, which end state do
> we prefer? While the first end state is acceptable for me, personally, I
> prefer the latter if we are designing from scratch. It is clean, consistent
> and intuitive. Given the size of Flink, keeping APIs in the same style over
> time is important. The migration is also not that complicated.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote:
>
> > Hi Venkat,
> >
> > Thanks for the proposal.
> >
> > I have some minor comments about the FLIP.
> >
> > 1. I think we don't need to
> > add SupportsFilterPushDown#supportsNestedFilters() method,
> > because connectors can skip nested filters by putting them in
> > Result#remainingFilters().
> > And this is backward-compatible because unknown expressions were added to
> > the remaining filters.
> > Planner should push predicate expressions as more as possible. If we add
> a
> > flag for each new filter,
> > the interface will be filled with lots of flags (e.g., supportsBetween,
> > supportsIN).
> >
> > 2. NestedFieldReferenceExpression#nestedFieldName should be an array of
> > field names?
> > Each string represents a field name part of the field path. Just keep
> > aligning with `nestedFieldIndexArray`.
> >
> > 3. My concern about making FieldReferenceExpression support nested fields
> > is the compatibility.
> > It is a public API and users/connectors are already using it. People
> > assumed it is a top-level column
> > reference, and applied logic on it. But that's not true now and this may
> > lead to unexpected errors.
> > Having a separate NestedFieldReferenceExpression sounds safer to me.
> Mixing
> > them in a class may
> >  confuse users what's the meaning of getFieldName() and getFieldIndex().
> >
> >
> > Regarding using NestedFieldReferenceExpression in
> > SupportsProjectionPushDown, do you
> > have any concerns @Timo Walther <twal...@apache.org> ?
> >
> > Best,
> > Jark
> >
> >
> >
> > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu
> > >
> > wrote:
> >
> > > Sounds like a great suggestion, Becket. +1. Agree with cleaning up the
> > APIs
> > > and making it consistent in all the pushdown APIs.
> > >
> > > Your suggested approach seems fine to me, unless anyone else has any
> > other
> > > concerns. Just have couple of clarifying questions:
> > >
> > > 1. Do you think we should standardize the APIs across all the pushdown
> > > supports like SupportsPartitionPushdown, SupportsDynamicFiltering etc
> in
> > > the end state?
> > >
> > > The current proposal works if we do not want to migrate
> > > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in
> > the
> > > > long term.
> > > >
> > > Did you mean *FieldReferenceExpression* instead of
> > > *NestedFieldReferenceExpression*?
> > >
> > > 2. Extend the FieldReferenceExpression to support nested fields.
> > > >     - Change the index field type from int to int[].
> > >
> > >     - Add a new method int[] getFieldIndexArray().
> > > >     - Deprecate the int getFieldIndex() method, the code will be
> > removed
> > > in
> > > > the next major version bump.
> > >
> > > I assume getFieldIndex would return fieldIndexArray[0], right?
> > >
> > > Thanks
> > > Venkat
> > >
> > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <becket....@gmail.com>
> wrote:
> > >
> > > > Thanks for the proposal, Venkata.
> > > >
> > > > The current proposal works if we do not want to migrate
> > > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in
> > the
> > > > long term.
> > > >
> > > Did you mean *FieldReferenceExpression* instead of
> > > *NestedFieldReferenceExpression*?
> > >
> > > >
> > > > Otherwise, the alternative solution briefly mentioned in the rejected
> > > > alternatives would be the following:
> > > > Phase 1:
> > > > 1. Introduce a supportsNestedFilters() method to the
> > > SupportsFilterPushdown
> > > > interface. (same as current proposal).
> > > > 2. Extend the FieldReferenceExpression to support nested fields.
> > > >     - Change the index field type from int to int[].
> > >
> > >     - Add a new method int[] getFieldIndexArray().
> > > >     - Deprecate the int getFieldIndex() method, the code will be
> > removed
> > > in
> > > > the next major version bump.
> > >
> > >
> > >
> > > 3. In the SupportsProjectionPushDown interface
> > > >     - add a new method
> applyProjection(List<FieldReferenceExpression>,
> > > > DataType), with default implementation invoking
> > applyProjection(int[][],
> > > > DataType)
> > > >     - deprecate the current applyProjection(int[][], DataType) method
> > > >
> > > > Phase 2 (in the next major version bump)
> > > > 1. remove the deprecated methods.
> > > >
> > > > Phase 3 (optional)
> > > > 1. deprecate and remove the supportsNestedFilters() /
> > > > supportsNestedProjection() methods from the SupportsFilterPushDown /
> > > > SupportsProjectionPushDown interfaces.
> > > >
> > > > Personally I prefer this alternative. It takes longer to finish the
> > work,
> > > > but the API eventually becomes clean and consistent. But I can live
> > with
> > > > the current proposal.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan <
> > > > vsowr...@asu.edu> wrote:
> > > >
> > > > > Gentle ping for reviews/feedback.
> > > > >
> > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan <
> > > > vsowr...@asu.edu
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I am opening this thread to discuss FLIP-356: Support Nested
> Fields
> > > > > > Filter Pushdown. The FLIP can be found at
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > > > >
> > > > > > This FLIP adds support for pushing down nested fields filters to
> > the
> > > > > > underlying TableSource. In our data lake, we find a lot of
> datasets
> > > > have
> > > > > > nested fields and also user queries with filters defined on the
> > > nested
> > > > > > fields. This would drastically improve the performance for those
> > sets
> > > > of
> > > > > > queries.
> > > > > >
> > > > > > Appreciate any comments or feedback you may have on this
> proposal.
> > > > > >
> > > > > > Regards
> > > > > > Venkata krishnan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to