Hi All, Recently, I've been thinking about this again and decided to quickly look at some statistics of final/non-final Step classes. After a quick search of the concrete Steps, the split is roughly 118 non-final Steps and 95 final Steps. Given that more than half the Steps have already been opened, it makes sense to allow the rest of them to become non-final. I mostly agree with the three step approach from above but I wouldn't remove final from every Step in a single PR as suggested in 1. I think it still makes sense to do it on a case by case basis, however, the requirement for removing final should be very low (that is almost any reason can be given for wanting to remove final).
That being said, I think this is true only for TinkerPop 3. As we move forward, I think it makes more sense for us to revisit how Steps have been implemented. Recently, a lot of the modifications have been made for the sake of performance. If we could change the way we execute the steps to allow for the optimizations in TinkerPop itself then maybe in a future version we could still keep the Steps as being final. In summary, I'm suggesting that we approve most PRs that are requesting to remove final from a Step as long as they follow some variation of the three step approach that Oleksandr suggested before. Thanks, Ken On Thu, Apr 20, 2023 at 9:52 AM Ken Hu <k...@bitquilltech.com> wrote: > I agree that there are some inconsistencies that we will need to work > through. I've taken a quick look at your three PRs and they seem reasonable > to me. I will officially review them soon. > > On Thu, Apr 20, 2023 at 9:28 AM Oleksandr Porunov < > alexandr.poru...@gmail.com> wrote: > >> Hi Ken, >> >> This is a great input. Thank you for the feedback. I believe you are right >> and we should check each case separately. >> That said, I was just worried if Graph Providers start blindly replacing >> steps with optimised versions of those steps then it becomes hard to track >> how it affects TinkerPop side optimization strategies and custom Steps. >> For example, let's take a look at any random optimization strategy on >> TinkerPop's side. For example EarlyLimitStrategy >> < >> https://github.com/apache/tinkerpop/blob/3.6-dev/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/EarlyLimitStrategy.java >> > >> . >> This strategy specifically has specific logic regarding usage of >> `RangeGlobalStep`. >> I.e. `if (step instanceof RangeGlobalStep)`. >> If we check this step it's `final` (`public final class RangeGlobalStep`). >> In such a case nobody could actually easily replace this strategy without >> affecting `EarlyLimitStrategy` because we can't extend / overwrite >> RangeGlobalStep due to being final. >> Moreover `if (step instanceof RangeGlobalStep)` could actually be replaced >> by `if (RangeGlobalStep.class.equals(step.getClass()))` because we know >> there are no instances of `RangeGlobalStep` other than `RangeGlobalStep` >> itself, but the strategy tells that any instance of `RangeGlobalStep` is >> valid for optimization. >> >> The above relation of Optimizations and Steps affects many Step classes >> and >> many optimization strategies. Moreover it's kind of inconsistent that some >> of the steps are `final` and some of them are non-final even without >> having >> any other classes extending them in the TinkerPop codebase. For example, >> PropertiesStep and PropertyMapStep are non-final, but they don't have any >> classes extending them inside TinkerPop codebase. Logically Graph >> Providers >> extend and replace those steps, but then why do we keep those steps open >> if >> their codebase is bigger and more likely to be changed then the other >> `final` steps which are more likely to stay as is? (i.e. I would expect >> PropertyMap step to be changed more often then the IsStep for example, >> even >> so the latter one is non-final). >> >> I do agree with you that we should probably be curious of any methods / >> fields which we open with `protected` and don't blindly replace `private` >> fields / methods to `protected` fields / methods (as I did in the PR), but >> I doubt any Step should be `final`. >> Taking your input into account I think the way we could do, is: >> 1. Open all Step classes for extension by removing `final`. (do this in a >> single PR for all steps) >> 2. Move all `private` / `protected` utility methods out from Step classes >> into some utility classes. (do this case by case depending on the step) >> 3. Add public getter methods and / or public constructors to be able to >> copy necessary step fields from the original step into a new extending >> step. (do this case by case depending on the step) >> >> Thus, I believe the original PR >> <https://github.com/apache/tinkerpop/pull/2027> should be closed and >> separate PRs should be opened. So far I opened 3 separate small PRs to >> make >> extensions for the specific steps. >> https://github.com/apache/tinkerpop/pull/2029 >> https://github.com/apache/tinkerpop/pull/2030 >> https://github.com/apache/tinkerpop/pull/2031 >> >> Would be great to know what you think about it. >> >> Thanks, >> Oleksandr >> >> >> On Thu, Apr 20, 2023 at 4:42 PM Ken Hu <k...@bitquilltech.com.invalid> >> wrote: >> >> > Hi Oleksandr, >> > >> > Thanks for bringing this idea to the community. However, I would be very >> > hesitant to include such a large change to TinkerPop. The Gremlin >> language >> > doesn't really have a specification which means that what a Step does is >> > more susceptible to change. I also feel like the current implementation >> of >> > Steps wasn't designed to be a stable and open interface so I would >> consider >> > them to be internal TinkerPop classes. From what I've heard, some of >> these >> > classes have intentionally been marked as final to prevent misusage >> which >> > is something that has happened in the past. Another concern I have is >> the >> > increased potential to break other's changes during refactoring or bug >> > fixing. It's easy to run into a situation where you make a fix for an >> issue >> > but don't realize that you have accidentally broken someone else's >> > implementation because you didn't know their code had a specific >> functional >> > dependency on the current implementation. This scenario is much more >> likely >> > to occur when you open up this many classes and there is no way to >> predict >> > what others will do with it. >> > >> > I think we can revisit this idea in the future once we have a firmer >> > specification for the Gremlin language. For now I think it is better to >> > take these changes on in a case by case basis. >> > >> > Thanks, >> > Ken >> > >> > On Tue, Apr 18, 2023 at 6:27 PM Oleksandr Porunov < >> > alexandr.poru...@gmail.com> wrote: >> > >> > > Hello everyone, >> > > >> > > I would like to get some opinion from the community regarding relaxing >> > > strictness of Step classes for better extensibility. >> > > The problem I encounter while making some optimizations for >> JanusGraph is >> > > that many steps are not quite extensible due to being marked as >> `final` >> > or >> > > simply having important fields as `private` without any `public` >> getters. >> > > Thus, making it complicated to replace original TinkerPop Steps with >> > Graph >> > > Provider's optimized Steps. >> > > In this discussion I would like to know community opinion on relaxing >> > Step >> > > classes strictness with making the following changes: >> > > >> > > 1. Remove `final` keyword from all Step classes, so that it would be >> > > possible to extend / overwrite them. Without this change Graph >> Providers >> > > might need to duplicate some logic during their custom Step >> > implementation >> > > (because they won't be able to extend an already existing step and >> > instead >> > > will need to re-implement it from scratch). >> > > Real example: >> > > We want to optimize the `ProjectStep` step to be able to query all >> > included >> > > traversals in parallel instead of sequentially ( >> > > https://github.com/JanusGraph/janusgraph/issues/3559 ). Current >> > > TinkerPop's ProjectStep if `final`. Thus, if we want to replace this >> step >> > > with the optimized version of this step we will need to create our >> custom >> > > step which will have most of the logic duplicated with ProjectStep >> (most >> > > likely the only difference will be in the `map` step). >> > > >> > > 2. Convert all methods and fields from `private` to `protected` to be >> > able >> > > to access them in extended Step versions. In case the extended version >> > > overwrites some of the original TinkerPop functionality then the child >> > > class may potentially need to access some of the parent's class >> fields. >> > If >> > > those fields are private and don't have any getters then it might be >> > > complicated to retrieve that information (it will require using >> > > Reflection API). >> > > Having fields as `protected` will allow for extended child classes to >> > > access parent's fields. >> > > Another solution could be to have `protected` getters available for >> those >> > > fields. >> > > Real example: >> > > We would like to optimize PropertyMap step to pre-fetch elements >> > properties >> > > ( https://github.com/JanusGraph/janusgraph/issues/2444 ). >> > > For that we have to overwrite the `map` method and re-implement it >> (could >> > > be considered as anti-pattern, but it's complicated to do that >> > otherwise). >> > > In the overwritten method we may have some duplicated logic to be >> > > re-implemented and for that we will need to access parent fields / >> > methods. >> > > This was fixed for PropertyMap here ( >> > > https://github.com/apache/tinkerpop/pull/2022 ), but there are many >> > other >> > > Steps which have private properties without getters. >> > > >> > > 3. (optional) Add public getters for fields which may be required to >> be >> > > copied from an original Step into an extension Step. >> > > Real example: >> > > Again about `PropertyMapStep`. Even though we changed the fields to >> > > `protected` (which now allows us to set and read those fields from the >> > > child class), we can't easily replace the original Step with the >> extended >> > > version of this Step because we will need to provide all the >> information >> > we >> > > have from the original Step into the new Step. Without having public >> > > getters available for those fields we won't be able to retrieve data >> for >> > > protected fields (unless we place the optimization class to the same >> > > package). >> > > I.e. If we implement a replacement Step for PropertyMap step we will >> need >> > > to access `tokens` and `traversalRing` from the original step, but we >> > won't >> > > be able to do that because currently PropertyMap doesn't have public >> > > getters for those fields. Thus, it will require us to place a new >> > > implementation class to the same package, so that we could access >> > protected >> > > fields. >> > > >> > > Overall, I think relaxing Step classes' strictness can benefit Graph >> > > Providers as they will be able to overwrite functionality they need. >> That >> > > said, due to small optimization strategies experience, I might miss >> some >> > > development patterns regarding developing optimizations. Potentially, >> it >> > > could be that replacing original Steps is a bad idea but it's hard >> for me >> > > to find good optimization solutions otherwise. >> > > >> > > These changes don't look to me as breaking changes because no code >> > changes >> > > will be required from current Graph Provides sides, but maybe I missed >> > > something. >> > > >> > > As for now, the JanusGraph community has plans to optimize usage of: >> > > PropertyMap, ElementMap, Coalesce, Has, Project steps. Thus it drived >> > this >> > > thinking and discussion. >> > > >> > > Ticket regarding relaxing step classes strictness: >> > > https://issues.apache.org/jira/browse/TINKERPOP-2927 >> > > PR to relax step classes strictness: >> > > https://github.com/apache/tinkerpop/pull/2027 >> > > >> > > Any feedback is appreciated. >> > > >> > > Best regards, >> > > Oleksandr >> > > >> > >> >