Hi Venkat,

Thanks for the proposal.

I have some minor comments about the FLIP.

1. I think we don't need to
add SupportsFilterPushDown#supportsNestedFilters() method,
because connectors can skip nested filters by putting them in
Result#remainingFilters().
And this is backward-compatible because unknown expressions were added to
the remaining filters.
Planner should push predicate expressions as more as possible. If we add a
flag for each new filter,
the interface will be filled with lots of flags (e.g., supportsBetween,
supportsIN).

2. NestedFieldReferenceExpression#nestedFieldName should be an array of
field names?
Each string represents a field name part of the field path. Just keep
aligning with `nestedFieldIndexArray`.

3. My concern about making FieldReferenceExpression support nested fields
is the compatibility.
It is a public API and users/connectors are already using it. People
assumed it is a top-level column
reference, and applied logic on it. But that's not true now and this may
lead to unexpected errors.
Having a separate NestedFieldReferenceExpression sounds safer to me. Mixing
them in a class may
 confuse users what's the meaning of getFieldName() and getFieldIndex().


Regarding using NestedFieldReferenceExpression in
SupportsProjectionPushDown, do you
have any concerns @Timo Walther <twal...@apache.org> ?

Best,
Jark



On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
wrote:

> Sounds like a great suggestion, Becket. +1. Agree with cleaning up the APIs
> and making it consistent in all the pushdown APIs.
>
> Your suggested approach seems fine to me, unless anyone else has any other
> concerns. Just have couple of clarifying questions:
>
> 1. Do you think we should standardize the APIs across all the pushdown
> supports like SupportsPartitionPushdown, SupportsDynamicFiltering etc in
> the end state?
>
> The current proposal works if we do not want to migrate
> > SupportsFilterPushdown to also use NestedFieldReferenceExpression in the
> > long term.
> >
> Did you mean *FieldReferenceExpression* instead of
> *NestedFieldReferenceExpression*?
>
> 2. Extend the FieldReferenceExpression to support nested fields.
> >     - Change the index field type from int to int[].
>
>     - Add a new method int[] getFieldIndexArray().
> >     - Deprecate the int getFieldIndex() method, the code will be removed
> in
> > the next major version bump.
>
> I assume getFieldIndex would return fieldIndexArray[0], right?
>
> Thanks
> Venkat
>
> On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <becket....@gmail.com> wrote:
>
> > Thanks for the proposal, Venkata.
> >
> > The current proposal works if we do not want to migrate
> > SupportsFilterPushdown to also use NestedFieldReferenceExpression in the
> > long term.
> >
> Did you mean *FieldReferenceExpression* instead of
> *NestedFieldReferenceExpression*?
>
> >
> > Otherwise, the alternative solution briefly mentioned in the rejected
> > alternatives would be the following:
> > Phase 1:
> > 1. Introduce a supportsNestedFilters() method to the
> SupportsFilterPushdown
> > interface. (same as current proposal).
> > 2. Extend the FieldReferenceExpression to support nested fields.
> >     - Change the index field type from int to int[].
>
>     - Add a new method int[] getFieldIndexArray().
> >     - Deprecate the int getFieldIndex() method, the code will be removed
> in
> > the next major version bump.
>
>
>
> 3. In the SupportsProjectionPushDown interface
> >     - add a new method applyProjection(List<FieldReferenceExpression>,
> > DataType), with default implementation invoking applyProjection(int[][],
> > DataType)
> >     - deprecate the current applyProjection(int[][], DataType) method
> >
> > Phase 2 (in the next major version bump)
> > 1. remove the deprecated methods.
> >
> > Phase 3 (optional)
> > 1. deprecate and remove the supportsNestedFilters() /
> > supportsNestedProjection() methods from the SupportsFilterPushDown /
> > SupportsProjectionPushDown interfaces.
> >
> > Personally I prefer this alternative. It takes longer to finish the work,
> > but the API eventually becomes clean and consistent. But I can live with
> > the current proposal.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu> wrote:
> >
> > > Gentle ping for reviews/feedback.
> > >
> > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu
> > > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am opening this thread to discuss FLIP-356: Support Nested Fields
> > > > Filter Pushdown. The FLIP can be found at
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$
> > > >
> > > > This FLIP adds support for pushing down nested fields filters to the
> > > > underlying TableSource. In our data lake, we find a lot of datasets
> > have
> > > > nested fields and also user queries with filters defined on the
> nested
> > > > fields. This would drastically improve the performance for those sets
> > of
> > > > queries.
> > > >
> > > > Appreciate any comments or feedback you may have on this proposal.
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >
> > >
> >
>

Reply via email to