Thanks Jark. Sounds good. One more thing, earlier in my summary I mentioned,
Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*) > abstract class which will be extended by both *FieldReferenceExpression* > and *NestedFieldReferenceExpression* (to be introduced as part of this > FLIP) This can be punted for now and can be handled as part of refactoring SupportsProjectionPushDown. Also I made minor changes to the *NestedFieldReferenceExpression, *instead of *fieldIndexArray* we can just do away with *fieldNames *array that includes fieldName at every level for the nested field. Updated the FLIP-357 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown> wiki as well. Regards Venkata krishnan On Tue, Aug 29, 2023 at 5:21 AM Jark Wu <imj...@gmail.com> wrote: > Hi Venkata, > > Your summary looks good to me. +1 to start a vote. > > I think we don't need "inputIndex" in NestedFieldReferenceExpression. > Actually, I think it is also not needed in FieldReferenceExpression, > and we should try to remove it (another topic). The RexInputRef in Calcite > also doesn't require an inputIndex because the field index should represent > index of the field in the underlying row type. Field references shouldn't > be > aware of the number of inputs. > > Best, > Jark > > > On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan <vsowr...@asu.edu > > > wrote: > > > Hi Jinsong, > > > > Thanks for your comments. > > > > What is inputIndex in NestedFieldReferenceExpression? > > > > > > I haven't looked at it before. Do you mean, given that it is now only > used > > to push filters it won't be subsequently used in further > > planning/optimization and therefore it is not required at this time? > > > > So if NestedFieldReferenceExpression doesn't need inputIndex, is there > > > a need to introduce a base class `ReferenceExpression`? > > > > For SupportsFilterPushDown itself, *ReferenceExpression* base class is > not > > needed. But there were discussions around cleaning up and standardizing > the > > API for Supports*PushDown. SupportsProjectionPushDown currently pushes > the > > projects as a 2-d array, instead it would be better to use the standard > API > > which seems to be the *ResolvedExpression*. For > SupportsProjectionPushDown > > either FieldReferenceExpression (top level fields) or > > NestedFieldReferenceExpression (nested fields) is enough, in order to > > provide a single API that handles both top level and nested fields, > > ReferenceExpression will be introduced as a base class. > > > > Eventually, *SupportsProjectionPushDown#applyProjections* would evolve as > > applyProjection(List<ReferenceExpression> projectedFields) and nested > > fields would be pushed only if *supportsNestedProjections* returns true. > > > > Regards > > Venkata krishnan > > > > > > On Sun, Aug 27, 2023 at 11:12 PM Jingsong Li <jingsongl...@gmail.com> > > wrote: > > > > > So if NestedFieldReferenceExpression doesn't need inputIndex, is there > > > a need to introduce a base class `ReferenceExpression`? > > > > > > Best, > > > Jingsong > > > > > > On Mon, Aug 28, 2023 at 2:09 PM Jingsong Li <jingsongl...@gmail.com> > > > wrote: > > > > > > > > Hi thanks all for your discussion. > > > > > > > > What is inputIndex in NestedFieldReferenceExpression? > > > > > > > > I know inputIndex has special usage in FieldReferenceExpression, but > > > > it is only for Join operators, and it is only for SQL optimization. > It > > > > looks like there is no requirement for Nested. > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Mon, Aug 28, 2023 at 1:13 PM Venkatakrishnan Sowrirajan > > > > <vsowr...@asu.edu> wrote: > > > > > > > > > > Thanks for all the feedback and discussion everyone. Looks like we > > have > > > > > reached a consensus here. > > > > > > > > > > Just to summarize: > > > > > > > > > > 1. Introduce a new *ReferenceExpression* (or > > *BaseReferenceExpression*) > > > > > abstract class which will be extended by both > > > *FieldReferenceExpression* > > > > > and *NestedFieldReferenceExpression* (to be introduced as part of > > this > > > FLIP) > > > > > 2. No need of *supportsNestedFilters *check as the current > > > > > *SupportsFilterPushDown* should already ignore unknown expressions > ( > > > > > *NestedFieldReferenceExpression* for example) and return them as > > > > > *remainingFilters. > > > > > *Maybe this should be clarified explicitly in the Javadoc of > > > > > *SupportsFilterPushDown. > > > > > *I will file a separate JIRA to fix the documentation. > > > > > 3. Refactor *SupportsProjectionPushDown* to use > *ReferenceExpression > > > *instead > > > > > of existing 2-d arrays to consolidate and be consistent with other > > > > > Supports*PushDown APIs - *outside the scope of this FLIP* > > > > > 4. Similarly *SupportsAggregatePushDown* should also be evolved > > > whenever > > > > > nested fields support is added to use the *ReferenceExpression - > > > **outside > > > > > the scope of this FLIP* > > > > > > > > > > Does this sound good? Please let me know if I have missed anything > > > here. If > > > > > there are no concerns, I will start a vote tomorrow. I will also > get > > > the > > > > > FLIP-356 wiki updated. Thanks everyone once again! > > > > > > > > > > Regards > > > > > Venkata krishnan > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 8:19 PM Becket Qin <becket....@gmail.com> > > > wrote: > > > > > > > > > > > Hi Jark, > > > > > > > > > > > > How about having a separate NestedFieldReferenceExpression, and > > > > > > > abstracting a common base class "ReferenceExpression" for > > > > > > > NestedFieldReferenceExpression and FieldReferenceExpression? > This > > > makes > > > > > > > unifying expressions in > > > > > > > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > > > ...)" > > > > > > > possible. > > > > > > > > > > > > > > > > > > I'd be fine with this. It at least provides a consistent API > style > > / > > > > > > formality. > > > > > > > > > > > > Re: Yunhong, > > > > > > > > > > > > 3. Finally, I think we need to look at the costs and benefits of > > > unifying > > > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or > > > others) > > > > > > from > > > > > > > the perspective of interface implementers. A stable API can > > reduce > > > user > > > > > > > development and change costs, if the current API can fully meet > > the > > > > > > > functional requirements at the framework level, I personal > > suggest > > > > > > reducing > > > > > > > the impact on connector developers. > > > > > > > > > > > > > > > > > > > I agree that the cost and benefit should be measured. And the > > > measurement > > > > > > should be in the long term instead of short term. That is why we > > > always > > > > > > need to align on the ideal end state first. > > > > > > Meeting functionality requirements is the bare minimum bar for an > > > API. > > > > > > Simplicity, intuitiveness, robustness and evolvability are also > > > important. > > > > > > In addition, for projects with many APIs, such as Flink, a > > > consistent API > > > > > > style is also critical for the user adoption as well as bug > > > avoidance. It > > > > > > is very helpful for the community to agree on some API design > > > conventions / > > > > > > principles. > > > > > > For example, in this particular case, via our discussion, > hopefully > > > we sort > > > > > > of established the following API design conventions / principles > > for > > > all > > > > > > the Supports*PushDown interfaces. > > > > > > > > > > > > 1. By default, expressions should be used if applicable instead > of > > > other > > > > > > representations. > > > > > > 2. In general, the pushdown method should not assume all the > > > pushdowns will > > > > > > succeed. So the applyX() method should return a boolean or > List<X>, > > > to > > > > > > handle the cases that some of the pushdowns cannot be fulfilled > by > > > the > > > > > > implementation. > > > > > > > > > > > > Establishing such conventions and principles demands careful > > > thinking for > > > > > > the aspects I mentioned earlier in addition to the API > > > functionalities. > > > > > > This helps lower the bar of understanding, reduces the chance of > > > having > > > > > > loose ends in the API, and will benefit all the participants in > the > > > project > > > > > > over time. I think this is the right way to achieve real API > > > stability. > > > > > > Otherwise, we may end up chasing our tails to find ways not to > > > change the > > > > > > existing non-ideal APIs. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Fri, Aug 25, 2023 at 9:33 AM yh z <zhengyunhon...@gmail.com> > > > wrote: > > > > > > > > > > > > > Hi, Venkat, > > > > > > > > > > > > > > Thanks for the FLIP, it sounds good to support nested fields > > filter > > > > > > > pushdown. Based on the design of flip and the above options, I > > > would like > > > > > > > to make a few suggestions: > > > > > > > > > > > > > > 1. At present, introducing NestedFieldReferenceExpression > looks > > > like a > > > > > > > better solution, which can fully meet our requirements while > > > reducing > > > > > > > modifications to base class FieldReferenceExpression. In the > long > > > run, I > > > > > > > tend to abstract a basic class for > NestedFieldReferenceExpression > > > and > > > > > > > FieldReferenceExpression as u suggested. > > > > > > > > > > > > > > 2. Personally, I don't recommend introducing > > > *supportsNestedFilters() in > > > > > > > supportsFilterPushdown. We just need to better declare the > return > > > value > > > > > > of > > > > > > > the method *applyFilters. > > > > > > > > > > > > > > 3. Finally, I think we need to look at the costs and benefits > of > > > unifying > > > > > > > the SupportsFilterPushDown and SupportsProjectionPushDown (or > > > others) > > > > > > from > > > > > > > the perspective of interface implementers. A stable API can > > reduce > > > user > > > > > > > development and change costs, if the current API can fully meet > > the > > > > > > > functional requirements at the framework level, I personal > > suggest > > > > > > reducing > > > > > > > the impact on connector developers. > > > > > > > > > > > > > > Regards, > > > > > > > Yunhong Zheng (Swuferhong) > > > > > > > > > > > > > > > > > > > > > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月25日周五 > > > 01:25写道: > > > > > > > > > > > > > > > To keep it backwards compatible, introduce another API > > > *applyAggregates > > > > > > > > *with > > > > > > > > *List<ReferenceExpression> *when nested field support is > added > > > and > > > > > > > > deprecate the current API. This will by default throw an > > > exception. In > > > > > > > > flink planner, *applyAggregates *with nested fields and if it > > > throws > > > > > > > > exception then *applyAggregates* without nested fields. > > > > > > > > > > > > > > > > Regards > > > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 10:13 AM Venkatakrishnan Sowrirajan < > > > > > > > > vsowr...@asu.edu> wrote: > > > > > > > > > > > > > > > > > Jark, > > > > > > > > > > > > > > > > > > How about having a separate NestedFieldReferenceExpression, > > and > > > > > > > > >> abstracting a common base class "ReferenceExpression" for > > > > > > > > >> NestedFieldReferenceExpression and > FieldReferenceExpression? > > > This > > > > > > > makes > > > > > > > > >> unifying expressions in > > > > > > > > >> > > > > > > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > > > > >> ...)" > > > > > > > > >> possible. > > > > > > > > > > > > > > > > > > This should be fine for *SupportsProjectionPushDown* and > > > > > > > > > *SupportsFilterPushDown*. One concern in the case of > > > > > > > > > *SupportsAggregatePushDown* with nested fields support (to > be > > > added > > > > > > in > > > > > > > > > the future), with this proposal, the API will become > > backwards > > > > > > > > incompatible > > > > > > > > > as the *args *for the aggregate function is > > > > > > > > *List<FieldReferenceExpression> > > > > > > > > > *that needs to change to *List<ReferenceExpression>*. > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > Venkata krishnan > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 24, 2023 at 1:18 AM Jark Wu <imj...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > > > >> Hi Becket, > > > > > > > > >> > > > > > > > > >> I think it is the second case, that a > > > FieldReferenceExpression is > > > > > > > > >> constructed > > > > > > > > >> by the framework and passed to the connector (interfaces > > > listed by > > > > > > > > >> Venkata[1] > > > > > > > > >> and Catalog#listPartitionsByFilter). Besides, > understanding > > > the > > > > > > nested > > > > > > > > >> field > > > > > > > > >> is optional for users/connectors (just treat it as an > > unknown > > > > > > > expression > > > > > > > > >> if > > > > > > > > >> the > > > > > > > > >> connector doesn't want to support it). > > > > > > > > >> > > > > > > > > >> If we extend FieldReferenceExpression, in the case of > "where > > > > > > > col.nested > > > > > > > > > > > > > > > > > >> 10", > > > > > > > > >> for the connectors already supported filter/delete > pushdown, > > > they > > > > > > may > > > > > > > > >> wrongly > > > > > > > > >> pushdown "col > 10" instead of "nested > 10" because they > > > still > > > > > > treat > > > > > > > > >> FieldReferenceExpression as a top-level column. This > problem > > > can be > > > > > > > > >> resolved > > > > > > > > >> by introducing an additional "supportedNestedPushdown" for > > > each > > > > > > > > interface, > > > > > > > > >> but that method is not elegant and is hard to remove in > the > > > future, > > > > > > > and > > > > > > > > >> this could > > > > > > > > >> be avoided if we have a separate > > > NestedFieldReferenceExpression. > > > > > > > > >> > > > > > > > > >> If we want to extend FieldReferenceExpression, we have to > > add > > > > > > > > protections > > > > > > > > >> for every related API in one shot. Besides, > > > FieldReferenceExpression > > > > > > > is > > > > > > > > a > > > > > > > > >> fundamental class in the planner, we have to go through > all > > > the code > > > > > > > > that > > > > > > > > >> is using it to make sure it properly handling it if it is > a > > > nested > > > > > > > field > > > > > > > > >> which > > > > > > > > >> is a big effort for the community. > > > > > > > > >> > > > > > > > > >> If we were designing this API on day 1, I fully support > > > merging them > > > > > > > in > > > > > > > > a > > > > > > > > >> FieldReferenceExpression. But in this case, I'm thinking > > > about how > > > > > > to > > > > > > > > >> provide > > > > > > > > >> users with a smooth migration path, and allow the > community > > to > > > > > > > gradually > > > > > > > > >> put efforts into evolving the API, and not block the > "Nested > > > Fields > > > > > > > > Filter > > > > > > > > >> Pushdown" > > > > > > > > >> requirement. > > > > > > > > >> > > > > > > > > >> How about having a separate > NestedFieldReferenceExpression, > > > and > > > > > > > > >> abstracting a common base class "ReferenceExpression" for > > > > > > > > >> NestedFieldReferenceExpression and > FieldReferenceExpression? > > > This > > > > > > > makes > > > > > > > > >> unifying expressions in > > > > > > > > >> > > > > > > > > > "SupportsProjectionPushdown#applyProjections(List<ReferenceExpression> > > > > > > > > >> ...)" > > > > > > > > >> possible. > > > > > > > > >> > > > > > > > > >> Best, > > > > > > > > >> Jark > > > > > > > > >> > > > > > > > > >> On Thu, 24 Aug 2023 at 07:00, Venkatakrishnan Sowrirajan < > > > > > > > > >> vsowr...@asu.edu> > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > >> > Becket and Jark, > > > > > > > > >> > > > > > > > > > >> > Deprecate all the other > > > > > > > > >> > > methods except tryApplyFilters() and > > > tryApplyProjections(). > > > > > > > > >> > > > > > > > > > >> > For *SupportsProjectionPushDown*, we still need a > > > > > > > > >> > *supportsNestedProjections* API on the table source as > > some > > > of the > > > > > > > > table > > > > > > > > >> > sources might not be able to handle nested fields and > > > therefore > > > > > > the > > > > > > > > >> Flink > > > > > > > > >> > planner should not push down the nested projections or > > else > > > the > > > > > > > > >> > *applyProjection > > > > > > > > >> > *API has to be appropriately changed to return > > > > > > > > >> > *unconvertibleProjections *similar > > > > > > > > >> > to *SupportsFilterPushDown*. > > > > > > > > >> > > > > > > > > > >> > Or we have to introduce two different applyProjections() > > > > > > > > >> > > methods for FieldReferenceExpression / > > > > > > > > NestedFieldReferenceExpression > > > > > > > > >> > > respectively. > > > > > > > > >> > > > > > > > > > >> > Agree this is not preferred. Given that > > > *supportNestedProjections > > > > > > > > >> *cannot > > > > > > > > >> > be deprecated/removed based on the current API form, > > > extending > > > > > > > > >> > *FieldReferenceExpression* to support nested fields > should > > > be > > > > > > okay. > > > > > > > > >> > > > > > > > > > >> > Another alternative could be to change *applyProjections > > > *to take > > > > > > > > >> > List<ResolvedExpression> and on the connector side they > > > choose to > > > > > > > > handle > > > > > > > > >> > *FieldReferenceExpression* and > > > *NestedFieldReferenceExpression *as > > > > > > > > >> > applicable and return the remainingProjections. In the > > case > > > of > > > > > > > nested > > > > > > > > >> field > > > > > > > > >> > projections not supported, it should return them back > but > > > only > > > > > > > > >> projecting > > > > > > > > >> > the top level fields. IMO, this is also *not preferred*. > > > > > > > > >> > > > > > > > > > >> > *SupportsAggregatePushDown* > > > > > > > > >> > > > > > > > > > >> > *AggregateExpression *currently takes in a list of > > > > > > > > >> > *FieldReferenceExpression* as args for the aggregate > > > function, if > > > > > > in > > > > > > > > >> future > > > > > > > > >> > *SupportsAggregatePushDown* adds support for aggregate > > > pushdown on > > > > > > > > >> nested > > > > > > > > >> > fields then the AggregateExpression API also has to > change > > > if a > > > > > > new > > > > > > > > >> > NestedFieldReferenceExpression is introduced for nested > > > fields. > > > > > > > > >> > > > > > > > > > >> > If we add a > > > > > > > > >> > > flag for each new filter, > > > > > > > > >> > > the interface will be filled with lots of flags (e.g., > > > > > > > > >> supportsBetween, > > > > > > > > >> > > supportsIN) > > > > > > > > >> > > > > > > > > > >> > In an ideal situation, I completely agree with you. But > in > > > the > > > > > > > current > > > > > > > > >> > state, *supportsNestedFilters* can act as a bridge to > > reach > > > the > > > > > > > > eventual > > > > > > > > >> > desired state which is to have a clean and consistent > set > > > of APIs > > > > > > > > >> > throughout all Supports*PushDown. > > > > > > > > >> > > > > > > > > > >> > Also shared some thoughts on the end state API > > > > > > > > >> > < > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://docs.google.com/document/d/1stLRPKOcxlEv8eHblkrOh0Zf5PLM-h76WMhEINHOyPY/edit?usp=sharing__;!!IKRxdwAv5BmarQ!ZZ2nS1PYlXLnEGFcikS3NsYG7tMaV3wU_z7FmvihNwQBmoLZk2WmcpuRWszK0FFmsInh9A6cndkJrQ$ > > > > > > > > >> > > > > > > > > > > >> > with extension to the *FieldReferenceExpression* to > > support > > > nested > > > > > > > > >> fields. > > > > > > > > >> > Please take a look. > > > > > > > > >> > > > > > > > > > >> > Regards > > > > > > > > >> > Venkata krishnan > > > > > > > > >> > > > > > > > > > >> > On Tue, Aug 22, 2023 at 5:02 PM Becket Qin < > > > becket....@gmail.com> > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > >> > > Hi Jark, > > > > > > > > >> > > > > > > > > > > >> > > Regarding the migration path, it would be useful to > > > scrutinize > > > > > > the > > > > > > > > use > > > > > > > > >> > case > > > > > > > > >> > > of FiledReferenceExpression and ResolvedExpressions. > > > There are > > > > > > two > > > > > > > > >> kinds > > > > > > > > >> > of > > > > > > > > >> > > use cases: > > > > > > > > >> > > > > > > > > > > >> > > 1. A ResolvedExpression is constructed by the user or > > > connector > > > > > > / > > > > > > > > >> plugin > > > > > > > > >> > > developers. > > > > > > > > >> > > 2. A ResolvedExpression is constructed by the > framework > > > and > > > > > > passed > > > > > > > > to > > > > > > > > >> > user > > > > > > > > >> > > or connector / plugin developers. > > > > > > > > >> > > > > > > > > > > >> > > For the first case, both of the approaches provide the > > > same > > > > > > > > migration > > > > > > > > >> > > experience. > > > > > > > > >> > > > > > > > > > > >> > > For the second case, generally speaking, introducing > > > > > > > > >> > > NestedFieldReferenceExpression and extending > > > > > > > > FieldReferenceExpression > > > > > > > > >> > would > > > > > > > > >> > > have the same impact for backwards compatibility. > > > > > > > > >> SupportsFilterPushDown > > > > > > > > >> > is > > > > > > > > >> > > a special case here because understanding the filter > > > expressions > > > > > > > is > > > > > > > > >> > > optional for the source implementation. In other use > > > cases, if > > > > > > > > >> > > understanding the reference to a nested field is a > must > > > have, > > > > > > the > > > > > > > > user > > > > > > > > >> > code > > > > > > > > >> > > has to be changed, regardless of which approach we > take > > to > > > > > > support > > > > > > > > >> nested > > > > > > > > >> > > fields. > > > > > > > > >> > > > > > > > > > > >> > > Therefore, I think we have to check each public API > > where > > > the > > > > > > > nested > > > > > > > > >> > field > > > > > > > > >> > > reference is exposed. If we have many public APIs > where > > > > > > > > understanding > > > > > > > > >> > > nested fields is optional for the user / plugin / > > > connector > > > > > > > > >> developers, > > > > > > > > >> > > having a separate NestedFieldReferenceExpression would > > > have a > > > > > > more > > > > > > > > >> smooth > > > > > > > > >> > > migration. Otherwise, there seems to be no difference > > > between > > > > > > the > > > > > > > > two > > > > > > > > >> > > approaches. > > > > > > > > >> > > > > > > > > > > >> > > Migration path aside, the main reason I prefer > extending > > > > > > > > >> > > FieldReferenceExpression over a new > > > > > > NestedFieldReferenceExpression > > > > > > > > is > > > > > > > > >> > > because this makes the SupportsProjectionPushDown > > > interface > > > > > > > simpler. > > > > > > > > >> > > Otherwise, we have to treat it as a special case that > > > does not > > > > > > > match > > > > > > > > >> the > > > > > > > > >> > > overall API style. Or we have to introduce two > different > > > > > > > > >> > applyProjections() > > > > > > > > >> > > methods for FieldReferenceExpression / > > > > > > > > NestedFieldReferenceExpression > > > > > > > > >> > > respectively. This issue further extends to > > > implementation in > > > > > > > > >> addition to > > > > > > > > >> > > public API. A single FieldReferenceExpression might > help > > > > > > simplify > > > > > > > > the > > > > > > > > >> > > implementation code a little bit. For example, in a > > > recursive > > > > > > > > >> processing > > > > > > > > >> > of > > > > > > > > >> > > a row with nested rows, we may not need to switch > > between > > > > > > > > >> > > FieldReferenceExpression and > > > NestedFieldReferenceExpression > > > > > > > > depending > > > > > > > > >> on > > > > > > > > >> > > whether the record being processed is a top level > record > > > or > > > > > > nested > > > > > > > > >> > record. > > > > > > > > >> > > > > > > > > > > >> > > Thanks, > > > > > > > > >> > > > > > > > > > > >> > > Jiangjie (Becket) Qin > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > On Tue, Aug 22, 2023 at 11:43 PM Jark Wu < > > > imj...@gmail.com> > > > > > > > wrote: > > > > > > > > >> > > > > > > > > > > >> > > > Hi Becket, > > > > > > > > >> > > > > > > > > > > > >> > > > I totally agree we should try to have a consistent > API > > > for a > > > > > > > final > > > > > > > > >> > state. > > > > > > > > >> > > > The only concern I have mentioned is the "smooth" > > > migration > > > > > > > path. > > > > > > > > >> > > > The FiledReferenceExpression is widely used in many > > > public > > > > > > APIs, > > > > > > > > >> > > > not only in the SupportsFilterPushDown. Yes, we can > > > change > > > > > > every > > > > > > > > >> > > > methods in 2-steps, but is it good to change API > back > > > and > > > > > > forth > > > > > > > > for > > > > > > > > >> > this? > > > > > > > > >> > > > Personally, I'm fine with a separate > > > > > > > > NestedFieldReferenceExpression > > > > > > > > >> > > class. > > > > > > > > >> > > > TBH, I prefer the separated way because it makes the > > > reference > > > > > > > > >> > expression > > > > > > > > >> > > > more clear and concise. > > > > > > > > >> > > > > > > > > > > > >> > > > Best, > > > > > > > > >> > > > Jark > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > On Tue, 22 Aug 2023 at 16:53, Becket Qin < > > > > > > becket....@gmail.com> > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > > > >> > > > > Thanks for the reply, Jark. > > > > > > > > >> > > > > > > > > > > > > >> > > > > I think it will be helpful to understand the final > > > state we > > > > > > > want > > > > > > > > >> to > > > > > > > > >> > > > > eventually achieve first, then we can discuss the > > > steps > > > > > > > towards > > > > > > > > >> that > > > > > > > > >> > > > final > > > > > > > > >> > > > > state. > > > > > > > > >> > > > > > > > > > > > > >> > > > > It looks like there are two proposed end states > now: > > > > > > > > >> > > > > > > > > > > > > >> > > > > 1. Have a separate NestedFieldReferenceExpression > > > class; > > > > > > keep > > > > > > > > >> > > > > SupportsFilterPushDown and > > SupportsProjectionPushDown > > > the > > > > > > > same. > > > > > > > > >> It is > > > > > > > > >> > > > just > > > > > > > > >> > > > > a one step change. > > > > > > > > >> > > > > - Regarding the supportsNestedFilterPushDown() > > > method, if > > > > > > > our > > > > > > > > >> > > contract > > > > > > > > >> > > > > with the connector developer today is "The > > > implementation > > > > > > > should > > > > > > > > >> > ignore > > > > > > > > >> > > > > unrecognized expressions by putting them into the > > > remaining > > > > > > > > >> filters, > > > > > > > > >> > > > > instead of throwing exceptions". Then there is no > > > need for > > > > > > > this > > > > > > > > >> > > method. I > > > > > > > > >> > > > > am not sure about the current contract. We should > > > probably > > > > > > > make > > > > > > > > it > > > > > > > > >> > > clear > > > > > > > > >> > > > in > > > > > > > > >> > > > > the interface Java doc. > > > > > > > > >> > > > > > > > > > > > > >> > > > > 2. Extend the existing FiledReferenceExpression > > class > > > to > > > > > > > support > > > > > > > > >> > nested > > > > > > > > >> > > > > fields; SupportsFilterPushDown only has one method > > of > > > > > > > > >> > > > > applyFilters(List<ResolvedExpression>); > > > > > > > > SupportsProjectionPushDown > > > > > > > > >> > only > > > > > > > > >> > > > has > > > > > > > > >> > > > > one method of > > > > > > applyProjections(List<FieldReferenceExpression>, > > > > > > > > >> > > DataType). > > > > > > > > >> > > > > It could just be two steps if we are not too > > obsessed > > > with > > > > > > the > > > > > > > > >> exact > > > > > > > > >> > > > names > > > > > > > > >> > > > > of "applyFilters" and "applyProjections". More > > > specifically, > > > > > > > it > > > > > > > > >> takes > > > > > > > > >> > > two > > > > > > > > >> > > > > steps to achieve this final state: > > > > > > > > >> > > > > a. introduce a new method > > > > > > > > >> > tryApplyFilters(List<ResolvedExpression>) > > > > > > > > >> > > > to > > > > > > > > >> > > > > SupportsFilterPushDown, which may have > > > > > > > FiledReferenceExpression > > > > > > > > >> with > > > > > > > > >> > > > nested > > > > > > > > >> > > > > fields. The default implementation throws an > > > exception. The > > > > > > > > >> runtime > > > > > > > > >> > > will > > > > > > > > >> > > > > first call tryApplyFilters() with nested fields. > In > > > case of > > > > > > > > >> > exception, > > > > > > > > >> > > it > > > > > > > > >> > > > > calls the existing applyFilters() without > including > > > the > > > > > > nested > > > > > > > > >> > filters. > > > > > > > > >> > > > > Similarly, in SupportsProjectionPushDown, > introduce > > a > > > > > > > > >> > > > > tryApplyProjections<List<NestedFieldReference> > > method > > > > > > > returning > > > > > > > > a > > > > > > > > >> > > Result. > > > > > > > > >> > > > > The Result also contains the accepted and > > unapplicable > > > > > > > > >> projections. > > > > > > > > >> > The > > > > > > > > >> > > > > default implementation also throws an exception. > > > Deprecate > > > > > > all > > > > > > > > the > > > > > > > > >> > > other > > > > > > > > >> > > > > methods except tryApplyFilters() and > > > tryApplyProjections(). > > > > > > > > >> > > > > b. remove the deprecated methods in the next > > major > > > > > > version > > > > > > > > >> bump. > > > > > > > > >> > > > > > > > > > > > > >> > > > > Now the question is putting the migration steps > > > aside, which > > > > > > > end > > > > > > > > >> > state > > > > > > > > >> > > do > > > > > > > > >> > > > > we prefer? While the first end state is acceptable > > > for me, > > > > > > > > >> > personally, > > > > > > > > >> > > I > > > > > > > > >> > > > > prefer the latter if we are designing from > scratch. > > > It is > > > > > > > clean, > > > > > > > > >> > > > consistent > > > > > > > > >> > > > > and intuitive. Given the size of Flink, keeping > APIs > > > in the > > > > > > > same > > > > > > > > >> > style > > > > > > > > >> > > > over > > > > > > > > >> > > > > time is important. The migration is also not that > > > > > > complicated. > > > > > > > > >> > > > > > > > > > > > > >> > > > > Thanks, > > > > > > > > >> > > > > > > > > > > > > >> > > > > Jiangjie (Becket) Qin > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > On Tue, Aug 22, 2023 at 2:23 PM Jark Wu < > > > imj...@gmail.com> > > > > > > > > wrote: > > > > > > > > >> > > > > > > > > > > > > >> > > > > > Hi Venkat, > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > Thanks for the proposal. > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > I have some minor comments about the FLIP. > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > 1. I think we don't need to > > > > > > > > >> > > > > > add > SupportsFilterPushDown#supportsNestedFilters() > > > method, > > > > > > > > >> > > > > > because connectors can skip nested filters by > > > putting them > > > > > > > in > > > > > > > > >> > > > > > Result#remainingFilters(). > > > > > > > > >> > > > > > And this is backward-compatible because unknown > > > > > > expressions > > > > > > > > were > > > > > > > > >> > > added > > > > > > > > >> > > > to > > > > > > > > >> > > > > > the remaining filters. > > > > > > > > >> > > > > > Planner should push predicate expressions as > more > > as > > > > > > > possible. > > > > > > > > >> If > > > > > > > > >> > we > > > > > > > > >> > > > add > > > > > > > > >> > > > > a > > > > > > > > >> > > > > > flag for each new filter, > > > > > > > > >> > > > > > the interface will be filled with lots of flags > > > (e.g., > > > > > > > > >> > > supportsBetween, > > > > > > > > >> > > > > > supportsIN). > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > 2. > NestedFieldReferenceExpression#nestedFieldName > > > should > > > > > > be > > > > > > > an > > > > > > > > >> > array > > > > > > > > >> > > of > > > > > > > > >> > > > > > field names? > > > > > > > > >> > > > > > Each string represents a field name part of the > > > field > > > > > > path. > > > > > > > > Just > > > > > > > > >> > keep > > > > > > > > >> > > > > > aligning with `nestedFieldIndexArray`. > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > 3. My concern about making > > FieldReferenceExpression > > > > > > support > > > > > > > > >> nested > > > > > > > > >> > > > fields > > > > > > > > >> > > > > > is the compatibility. > > > > > > > > >> > > > > > It is a public API and users/connectors are > > already > > > using > > > > > > > it. > > > > > > > > >> > People > > > > > > > > >> > > > > > assumed it is a top-level column > > > > > > > > >> > > > > > reference, and applied logic on it. But that's > not > > > true > > > > > > now > > > > > > > > and > > > > > > > > >> > this > > > > > > > > >> > > > may > > > > > > > > >> > > > > > lead to unexpected errors. > > > > > > > > >> > > > > > Having a separate NestedFieldReferenceExpression > > > sounds > > > > > > > safer > > > > > > > > to > > > > > > > > >> > me. > > > > > > > > >> > > > > Mixing > > > > > > > > >> > > > > > them in a class may > > > > > > > > >> > > > > > confuse users what's the meaning of > > getFieldName() > > > and > > > > > > > > >> > > > getFieldIndex(). > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > Regarding using NestedFieldReferenceExpression > in > > > > > > > > >> > > > > > SupportsProjectionPushDown, do you > > > > > > > > >> > > > > > have any concerns @Timo Walther < > > twal...@apache.org> > > > ? > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > Best, > > > > > > > > >> > > > > > Jark > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > On Tue, 22 Aug 2023 at 05:55, Venkatakrishnan > > > Sowrirajan < > > > > > > > > >> > > > > vsowr...@asu.edu > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > wrote: > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > Sounds like a great suggestion, Becket. +1. > > Agree > > > with > > > > > > > > >> cleaning > > > > > > > > >> > up > > > > > > > > >> > > > the > > > > > > > > >> > > > > > APIs > > > > > > > > >> > > > > > > and making it consistent in all the pushdown > > APIs. > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > Your suggested approach seems fine to me, > unless > > > anyone > > > > > > > else > > > > > > > > >> has > > > > > > > > >> > > any > > > > > > > > >> > > > > > other > > > > > > > > >> > > > > > > concerns. Just have couple of clarifying > > > questions: > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > 1. Do you think we should standardize the APIs > > > across > > > > > > all > > > > > > > > the > > > > > > > > >> > > > pushdown > > > > > > > > >> > > > > > > supports like SupportsPartitionPushdown, > > > > > > > > >> SupportsDynamicFiltering > > > > > > > > >> > > etc > > > > > > > > >> > > > > in > > > > > > > > >> > > > > > > the end state? > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > The current proposal works if we do not want > to > > > migrate > > > > > > > > >> > > > > > > > SupportsFilterPushdown to also use > > > > > > > > >> > NestedFieldReferenceExpression > > > > > > > > >> > > > in > > > > > > > > >> > > > > > the > > > > > > > > >> > > > > > > > long term. > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* > instead > > of > > > > > > > > >> > > > > > > *NestedFieldReferenceExpression*? > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > 2. Extend the FieldReferenceExpression to > > support > > > nested > > > > > > > > >> fields. > > > > > > > > >> > > > > > > > - Change the index field type from int > to > > > int[]. > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > - Add a new method int[] > > getFieldIndexArray(). > > > > > > > > >> > > > > > > > - Deprecate the int getFieldIndex() > > method, > > > the > > > > > > code > > > > > > > > >> will > > > > > > > > >> > be > > > > > > > > >> > > > > > removed > > > > > > > > >> > > > > > > in > > > > > > > > >> > > > > > > > the next major version bump. > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > I assume getFieldIndex would return > > > fieldIndexArray[0], > > > > > > > > right? > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > Thanks > > > > > > > > >> > > > > > > Venkat > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > On Fri, Aug 18, 2023 at 4:47 PM Becket Qin < > > > > > > > > >> becket....@gmail.com > > > > > > > > >> > > > > > > > > > > >> > > > > wrote: > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > Thanks for the proposal, Venkata. > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > The current proposal works if we do not want > > to > > > > > > migrate > > > > > > > > >> > > > > > > > SupportsFilterPushdown to also use > > > > > > > > >> > NestedFieldReferenceExpression > > > > > > > > >> > > > in > > > > > > > > >> > > > > > the > > > > > > > > >> > > > > > > > long term. > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > Did you mean *FieldReferenceExpression* > instead > > of > > > > > > > > >> > > > > > > *NestedFieldReferenceExpression*? > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > Otherwise, the alternative solution briefly > > > mentioned > > > > > > in > > > > > > > > the > > > > > > > > >> > > > rejected > > > > > > > > >> > > > > > > > alternatives would be the following: > > > > > > > > >> > > > > > > > Phase 1: > > > > > > > > >> > > > > > > > 1. Introduce a supportsNestedFilters() > method > > > to the > > > > > > > > >> > > > > > > SupportsFilterPushdown > > > > > > > > >> > > > > > > > interface. (same as current proposal). > > > > > > > > >> > > > > > > > 2. Extend the FieldReferenceExpression to > > > support > > > > > > nested > > > > > > > > >> > fields. > > > > > > > > >> > > > > > > > - Change the index field type from int > to > > > int[]. > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > - Add a new method int[] > > getFieldIndexArray(). > > > > > > > > >> > > > > > > > - Deprecate the int getFieldIndex() > > method, > > > the > > > > > > code > > > > > > > > >> will > > > > > > > > >> > be > > > > > > > > >> > > > > > removed > > > > > > > > >> > > > > > > in > > > > > > > > >> > > > > > > > the next major version bump. > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > 3. In the SupportsProjectionPushDown interface > > > > > > > > >> > > > > > > > - add a new method > > > > > > > > >> > > > > applyProjection(List<FieldReferenceExpression>, > > > > > > > > >> > > > > > > > DataType), with default implementation > > invoking > > > > > > > > >> > > > > > applyProjection(int[][], > > > > > > > > >> > > > > > > > DataType) > > > > > > > > >> > > > > > > > - deprecate the current > > > applyProjection(int[][], > > > > > > > > >> DataType) > > > > > > > > >> > > > method > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > Phase 2 (in the next major version bump) > > > > > > > > >> > > > > > > > 1. remove the deprecated methods. > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > Phase 3 (optional) > > > > > > > > >> > > > > > > > 1. deprecate and remove the > > > supportsNestedFilters() / > > > > > > > > >> > > > > > > > supportsNestedProjection() methods from the > > > > > > > > >> > > SupportsFilterPushDown > > > > > > > > >> > > > / > > > > > > > > >> > > > > > > > SupportsProjectionPushDown interfaces. > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > Personally I prefer this alternative. It > takes > > > longer > > > > > > to > > > > > > > > >> finish > > > > > > > > >> > > the > > > > > > > > >> > > > > > work, > > > > > > > > >> > > > > > > > but the API eventually becomes clean and > > > consistent. > > > > > > > But I > > > > > > > > >> can > > > > > > > > >> > > live > > > > > > > > >> > > > > > with > > > > > > > > >> > > > > > > > the current proposal. > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > Thanks, > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > Jiangjie (Becket) Qin > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > On Sat, Aug 19, 2023 at 12:09 AM > > Venkatakrishnan > > > > > > > > Sowrirajan > > > > > > > > >> < > > > > > > > > >> > > > > > > > vsowr...@asu.edu> wrote: > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > Gentle ping for reviews/feedback. > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 15, 2023, 5:37 PM > > Venkatakrishnan > > > > > > > > Sowrirajan < > > > > > > > > >> > > > > > > > vsowr...@asu.edu > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > wrote: > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > Hi All, > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > I am opening this thread to discuss > > > FLIP-356: > > > > > > > Support > > > > > > > > >> > Nested > > > > > > > > >> > > > > Fields > > > > > > > > >> > > > > > > > > > Filter Pushdown. The FLIP can be found > at > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!clxXJwshKpn559SAkQiieqgGe0ZduXCzUKCmYLtFIbQLmrmEEgdmuEIM8ZM1M3O_uGqOploU4ailqGpukAg$ > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > This FLIP adds support for pushing down > > > nested > > > > > > > fields > > > > > > > > >> > filters > > > > > > > > >> > > > to > > > > > > > > >> > > > > > the > > > > > > > > >> > > > > > > > > > underlying TableSource. In our data > lake, > > > we find > > > > > > a > > > > > > > > lot > > > > > > > > >> of > > > > > > > > >> > > > > datasets > > > > > > > > >> > > > > > > > have > > > > > > > > >> > > > > > > > > > nested fields and also user queries with > > > filters > > > > > > > > >> defined on > > > > > > > > >> > > the > > > > > > > > >> > > > > > > nested > > > > > > > > >> > > > > > > > > > fields. This would drastically improve > the > > > > > > > performance > > > > > > > > >> for > > > > > > > > >> > > > those > > > > > > > > >> > > > > > sets > > > > > > > > >> > > > > > > > of > > > > > > > > >> > > > > > > > > > queries. > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > Appreciate any comments or feedback you > > may > > > have > > > > > > on > > > > > > > > this > > > > > > > > >> > > > > proposal. > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > Regards > > > > > > > > >> > > > > > > > > > Venkata krishnan > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >