Hi Becket, I totally agree we should try to have a consistent API for a final state. The only concern I have mentioned is the "smooth" migration path. The FiledReferenceExpression is widely used in many public APIs, not only in the SupportsFilterPushDown. Yes, we can change every methods in 2-steps, but is it good to change API back and forth for this? Personally, I'm fine with a separate NestedFieldReferenceExpression class. TBH, I prefer the separated way because it makes the reference expression more clear and concise.
Best, Jark On Tue, 22 Aug 2023 at 16:53, Becket Qin <becket....@gmail.com> wrote: > Thanks for the reply, Jark. > > I think it will be helpful to understand the final state we want to > eventually achieve first, then we can discuss the steps towards that final > state. > > It looks like there are two proposed end states now: > > 1. Have a separate NestedFieldReferenceExpression class; keep > SupportsFilterPushDown and SupportsProjectionPushDown the same. It is just > a one step change. > - Regarding the supportsNestedFilterPushDown() method, if our contract > with the connector developer today is "The implementation should ignore > unrecognized expressions by putting them into the remaining filters, > instead of throwing exceptions". Then there is no need for this method. I > am not sure about the current contract. We should probably make it clear in > the interface Java doc. > > 2. Extend the existing FiledReferenceExpression class to support nested > fields; SupportsFilterPushDown only has one method of > applyFilters(List<ResolvedExpression>); SupportsProjectionPushDown only has > one method of applyProjections(List<FieldReferenceExpression>, DataType). > It could just be two steps if we are not too obsessed with the exact names > of "applyFilters" and "applyProjections". More specifically, it takes two > steps to achieve this final state: > a. introduce a new method tryApplyFilters(List<ResolvedExpression>) to > SupportsFilterPushDown, which may have FiledReferenceExpression with nested > fields. The default implementation throws an exception. The runtime will > first call tryApplyFilters() with nested fields. In case of exception, it > calls the existing applyFilters() without including the nested filters. > Similarly, in SupportsProjectionPushDown, introduce a > tryApplyProjections<List<NestedFieldReference> method returning a Result. > The Result also contains the accepted and unapplicable projections. The > default implementation also throws an exception. Deprecate all the other > methods except tryApplyFilters() and tryApplyProjections(). > b. remove the deprecated methods in the next major version bump. > > Now the question is putting the migration steps aside, which end state do > we prefer? While the first end state is acceptable for me, personally, I > prefer the latter if we are designing from scratch. It is clean, consistent > and intuitive. Given the size of Flink, keeping APIs in the same style over > time is important. The migration is also not that complicated. > > Thanks, > > Jiangjie (Becket) Qin > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu <imj...@gmail.com> wrote: > > > Hi Venkat, > > > > Thanks for the proposal. > > > > I have some minor comments about the FLIP. > > > > 1. I think we don't need to > > add SupportsFilterPushDown#supportsNestedFilters() method, > > because connectors can skip nested filters by putting them in > > Result#remainingFilters(). > > And this is backward-compatible because unknown expressions were added to > > the remaining filters. > > Planner should push predicate expressions as more as possible. If we add > a > > flag for each new filter, > > the interface will be filled with lots of flags (e.g., supportsBetween, > > supportsIN). > > > > 2. NestedFieldReferenceExpression#nestedFieldName should be an array of > > field names? > > Each string represents a field name part of the field path. Just keep > > aligning with `nestedFieldIndexArray`. > > > > 3. My concern about making FieldReferenceExpression support nested fields > > is the compatibility. > > It is a public API and users/connectors are already using it. People > > assumed it is a top-level column > > reference, and applied logic on it. But that's not true now and this may > > lead to unexpected errors. > > Having a separate NestedFieldReferenceExpression sounds safer to me. > Mixing > > them in a class may > > confuse users what's the meaning of getFieldName() and getFieldIndex(). > > > > > > Regarding using NestedFieldReferenceExpression in > > SupportsProjectionPushDown, do you > > have any concerns @Timo Walther <twal...@apache.org> ? > > > > Best, > > Jark > > > > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan Sowrirajan < > vsowr...@asu.edu > > > > > wrote: > > > > > Sounds like a great suggestion, Becket. +1. Agree with cleaning up the > > APIs > > > and making it consistent in all the pushdown APIs. > > > > > > Your suggested approach seems fine to me, unless anyone else has any > > other > > > concerns. Just have couple of clarifying questions: > > > > > > 1. Do you think we should standardize the APIs across all the pushdown > > > supports like SupportsPartitionPushdown, SupportsDynamicFiltering etc > in > > > the end state? > > > > > > The current proposal works if we do not want to migrate > > > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in > > the > > > > long term. > > > > > > > Did you mean *FieldReferenceExpression* instead of > > > *NestedFieldReferenceExpression*? > > > > > > 2. Extend the FieldReferenceExpression to support nested fields. > > > > - Change the index field type from int to int[]. > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > - Deprecate the int getFieldIndex() method, the code will be > > removed > > > in > > > > the next major version bump. > > > > > > I assume getFieldIndex would return fieldIndexArray[0], right? > > > > > > Thanks > > > Venkat > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin <becket....@gmail.com> > wrote: > > > > > > > Thanks for the proposal, Venkata. > > > > > > > > The current proposal works if we do not want to migrate > > > > SupportsFilterPushdown to also use NestedFieldReferenceExpression in > > the > > > > long term. > > > > > > > Did you mean *FieldReferenceExpression* instead of > > > *NestedFieldReferenceExpression*? > > > > > > > > > > > Otherwise, the alternative solution briefly mentioned in the rejected > > > > alternatives would be the following: > > > > Phase 1: > > > > 1. Introduce a supportsNestedFilters() method to the > > > SupportsFilterPushdown > > > > interface. (same as current proposal). > > > > 2. Extend the FieldReferenceExpression to support nested fields. > > > > - Change the index field type from int to int[]. > > > > > > - Add a new method int[] getFieldIndexArray(). > > > > - Deprecate the int getFieldIndex() method, the code will be > > removed > > > in > > > > the next major version bump. > > > > > > > > > > > > 3. In the SupportsProjectionPushDown interface > > > > - add a new method > applyProjection(List<FieldReferenceExpression>, > > > > DataType), with default implementation invoking > > applyProjection(int[][], > > > > DataType) > > > > - deprecate the current applyProjection(int[][], DataType) method > > > > > > > > Phase 2 (in the next major version bump) > > > > 1. remove the deprecated methods. > > > > > > > > Phase 3 (optional) > > > > 1. deprecate and remove the supportsNestedFilters() / > > > > supportsNestedProjection() methods from the SupportsFilterPushDown / > > > > SupportsProjectionPushDown interfaces. > > > > > > > > Personally I prefer this alternative. It takes longer to finish the > > work, > > > > but the API eventually becomes clean and consistent. But I can live > > with > > > > the current proposal. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM Venkatakrishnan Sowrirajan < > > > > vsowr...@asu.edu> wrote: > > > > > > > > > Gentle ping for reviews/feedback. > > > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM Venkatakrishnan Sowrirajan < > > > > vsowr...@asu.edu > > > > > > > > > > > wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I am opening this thread to discuss FLIP-356: Support Nested > Fields > > > > > > Filter Pushdown. The FLIP can be found at > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ > > > > > > > > > > > > This FLIP adds support for pushing down nested fields filters to > > the > > > > > > underlying TableSource. In our data lake, we find a lot of > datasets > > > > have > > > > > > nested fields and also user queries with filters defined on the > > > nested > > > > > > fields. This would drastically improve the performance for those > > sets > > > > of > > > > > > queries. > > > > > > > > > > > > Appreciate any comments or feedback you may have on this > proposal. > > > > > > > > > > > > Regards > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > > > >