Based on the current discussion, I'll make all() and any() behave like and() and or() where they are ternary-aware steps.
On Wed, Sep 20, 2023 at 4:53 PM Cole Greer <cole.gr...@improving.com.invalid> wrote: > Hi Mike, > > Thanks for weighing in here. I think this discussion needs to be split in > two parts. On one side there is the implementation discussion which Pieter > and Ken have led which focuses on how to make this work for providers, > should exceptions be used in control flow, etc. I think the second > discussion needs to be about ensuring the semantics make sense for gremlin. > My focus here is on the semantic discussion. > > I agree with your points that ternary-boolean/three-valued logic well > defined and pervasive in many database systems. After all the semantics > outlined in our provider docs ( > https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_ternary_boolean_logics) > are essentially the same as comparison and logic semantics in the SQL > standard. Where I have some concerns with the semantics in gremlin has to > do with the boundaries of the logic expression. In SQL, it is very clear > what is, and is not a part of a single logic expression. If I have > something such as: > > WHERE x > 5 > OR > (NOT y < 10); > > It is very well understood that everything following the WHERE should be > considered as a single logic expression which is recursively composed of > other logic expressions or comparisons. All of the ternary logic is > isolated and contained within that composite expression, and it is well > understood that WHERE will remove any results in which the composite > expression does not return true. > > Gremlin on the other hand does not have a notion of a logic expression. We > have “atomic” predicates like P.gt, and we have composite predicates such > as P.and, but there is also logical FilterSteps such as AndStep and > NotStep. These steps are what creates the fuzzy boundaries around a logic > expression. As of right now, the equivalent of the above in gremlin would > be something such as: > > Ex1: g…().as(“x”)…().as(“y”).where( or( select(“x”).is(P.gt(5)), > select(“y”).is(P.not(P.lt(10))) ) ) > > The question I want to ask about this is what part of that traversal is > the “logic expression”. It is tempting to define it as everything within > the where() step. That definition makes sense in this example but is > difficult to formally define for general cases. If define a logic > expression as the contents of a where() step, then in the following example > (Ex2), there is a hard break between the inner where() step and the larger > expression. > > Ex2: g…().as(“x”)…().as(“y”).where( or( select(“x”).is(P.gt(5)), > where(select(“y”).is(P.not(P.lt(10)))) ) ) > > In this example, the whole expression in the outer where()-step feels like > a single composite logic expression and having the internal where()-step > create a hard break and early reduction leads to an unexpected difference > in semantics between Ex1 and Ex2. I have seen a few cases in the past in > which the different error reduction points between Ex1 and Ex2 has created > a lot of confusion for users. > > An additional bit of weird semantics we currently have due to > implementation limitations is that Error states cannot get passed through > any parent step which is not a FilterStep. This is because the logic for > handling errors is implemented within FilterStep and it is not clear what > it would mean to reduce an error state to false from outside of the context > of a FilterStep. However, the issue that this creates is that a traversal > such as Ex3, will be treated as a single composite logic expression, > whereas the union()-step in Ex4 forces early reduction and breaks this into > 2 independent logic expressions. > > Ex3: …not( is(eq(0)) ) > Ex4: …not( union( is(eq(0)) ) ) > > To add one more layer on top of this, we currently have certain strategies > which may reorder, modify, replace, or remove certain steps as > optimizations. Many of these optimizations have not properly considered the > ternary boolean semantics and produce incorrect results in error cases. > > I have spent a fair bit of time thinking about this and I have not yet > come up with a clear set of rules to determine the right boundary for a > logic expression which feels intuitive, or at the very least is teachable. > Any system I’ve come up with which encompasses steps within logic > expression always leads to certain edge cases where the semantics no longer > make sense. For this reason, I’m wondering if the best solution might be to > set the boundary at the predicate level. In this case ternary boolean > semantics would work within a composite predicate such as > P.lt(5).and(P.not(P.eq(3))), however when returning a value out to the step > level, it would have to be either true or false. Under this model we would > no longer view a tree of And/Or/Not steps as a single logic expression but > instead as a series of discrete steps. I think this may be the right path > forward although it likely requires leveling up the capabilities of > predicates in order to be more useful. > > I would be curious to know if others have any thoughts on what the right > rules are for determining when to reduce from ternary boolean logic to true > and false. > > Regards, > > Cole > > > > From: Mike Personick <m...@dayzero.io> > Date: Wednesday, September 20, 2023 at 9:41 AM > To: dev@tinkerpop.apache.org <dev@tinkerpop.apache.org> > Subject: Re: [DISCUSS] Ternary Boolean Handling in New Steps > Early reduction is not possible because of negation. Please see the > discussion of ternary boolean semantics here: > > > https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_ternary_boolean_logics > > On Tue, Sep 19, 2023 at 12:25 PM Ken Hu <k...@bitquilltech.com.invalid> > wrote: > > > Thanks for your input Pieter. > > > > I agree with a lot of what you said and I think your suggestion is > > reasonable. From what I can tell, the logic in the FilterStep is there > > because reduction points are needed for the ternary boolean system. One > of > > the ways to move logic out of this area would be to get rid of ternary > > boolean by immediately reducing any ERROR state to FALSE. > > > > On Sat, Sep 16, 2023 at 1:06 AM pieter <pieter.mar...@gmail.com> wrote: > > > > > Hi, > > > I have not really applied my mind to issue of the semantics, but what I > > > do recall is being irritated with `GremlinValueComparator` throwing > > > `GremlinTypeErrorException`. > > > Sometimes its propogated and sometimes swallowed. Code smell!!! > > > Using exceptions as process logic right there in the heart of > > > TinkerPop's iterator logic seemed to me as a bad idea and breaks > > > providers ability to override classes. > > > A good example is the logic in FilterStep.processNextStart() where the > > > exception is being swalloed. This logic should not be here and > > > exceptions should not be used for control flow. > > > Providers expect the base filter step to filter, not conditionally > > > swallow exceptions based on a long if statement. > > > My suggestion is let the comparator do what comparators do and return a > > > int. The type issue should be handled higher up the stack. > > > > > > Regards > > > Pieter > > > On Fri, 2023-09-15 at 14:14 -0700, Ken Hu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Cole, "However, it makes sense for this short-term decision > > > > > > > to align with our long-term direction regarding comparability > > > > > > > semantics. I > > > > > > > wouldn’t be opposed to your proposed implementation if the > > > > > > > long-term plan is to move all steps towards this immediate > > > > > > > reduction behaviour." This is sort of my thinking as well. As > > > > > > > you demonstrated in your post, there is already an > > > > > > > inconsistency with the way ternary > > > > > > > boolean is reduced which leads to different results for > > > > > > > equivalent queries. > > > > > > > This is why I would prefer to just move ahead with an > > > > > > > implementation that I > > > > > > > believe is the most consistent with the expectations of users. > > > > > > > However, you > > > > > > > have valid concerns about adding even more inconsistencies to > > > > > > > the > > > > > > > language so if others voice their concern as well then I'll > > > > > > > make it > > > > > > > behave more like AND and OR. Regards, Ken On Mon, Sep 11, 2023 > > > > > > > at 6:11 PM Cole Greer cole.gr...@improving.com.invalidwrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ken, Thanks for bringing this up, I believe topic > > > > > > > > > > > warrants some > > > > > > > > > > > further discussion. My understanding of the intent of > > > > > > > > > > > the > > > > > > > > > > > current system is that it aims to provide a consistent > > > > > > > > > > > and predictable set of > > > > > > > > > > > rules for comparisons between any datatypes. Prior to > > > > > > > > > > > 3.6, in general > > > > > > > > > > > comparisons between different types in gremlin produced > > > > > > > > > > > undefined behaviour (in practice this usually meant an > > > > > > > > > > > exception). The current system > > > > > > > > > > > successfully resolved much of this issue although it > > > > > > > > > > > has introduced certain > > > > > > > > > > > semantic consistency issues (see > > > > > > > > > > > https://issues.apache.org/jira/browse/TINKERPOP-2940). > > > > > > > > > > > Further, > > > > > > > > > > > while the docs ( > > > > > > > > > > > > https://tinkerpop.apache.org/docs/3.7.0/dev/provider/#_ > > > > > > > > > > > ternary_boolean_logics) > > > > > > > > > > > are quite clear regarding the propagation/reduction > > > > > > > > > > > behaviour > > > > > > > > > > > in many cases, as you probe the edges it becomes > > > > > > > > > > > muddier. Considering the following example, the docs > > > > > > > > > > > quite clearly > > > > > > > > > > > define the expected behaviour of the first traversal, > > > > > > > > > > > but the expected behaviour is not clear outside of > > > > > > > > > > > basic combinations of AND, > > > > > > > > > > > OR, and NOT: gremlin> g.inject(1).not(is(gt("one"))) // > > > > > > > > > > > Produces no output > > > > > > > > > > > gremlin> g.inject(1).not(union(is(gt("one")), > > > > > > > > > > > is(eq("zero")))) > > > > > > > > > > > ==>1 // Error is reduced to false prior to Union Step, > > > > > > > > > > > and thus > > > > > > > > > > > not propagated into the Not Step. This is a good > > > > > > > > > > > example that we are currently in a bit of a > > > > > > > > > > > weird place where some of the language semantics are > > > > > > > > > > > formally defined > > > > > > > > > > > in documentation, while the rest of the language > > > > > > > > > > > semantics are > > > > > > > > > > > defined by implementation. It currently cannot be > > > > > > > > > > > determined if the above > > > > > > > > > > > example is expected or a bug. I believe it is important > > > > > > > > > > > that we find a resolution to > > > > > > > > > > > this by expanding our formally defined semantics or > > > > > > > > > > > changing the > > > > > > > > > > > implementation (when a breaking change is permittable). > > > > > > > > > > > As for the short-term question posed by ANY and ALL, my > > > > > > > > > > > only concern with your suggestion is it would be > > > > > > > > > > > subject to the > > > > > > > > > > > following inconsistency although as shown above there > > > > > > > > > > > is current > > > > > > > > > > > precedent for this sort of thing. gremlin> > > > > > > > > > > > g.inject(1).not(is(lt("one"))) // Produces no output > > > > > > > > > > > gremlin> g.inject([1]).not(any(is(lt("one")))) ==>[1] > > > > > > > > > > > In my opinion the most neutral direction would be for > > > > > > > > > > > ANY to > > > > > > > > > > > behave the same as a chain of OR’s and for ALL to act > > > > > > > > > > > as a chain of > > > > > > > > > > > ANDs. However, it makes sense for this short-term > > > > > > > > > > > decision to align > > > > > > > > > > > with our long-term direction regarding comparability > > > > > > > > > > > semantics. I > > > > > > > > > > > wouldn’t be opposed to your proposed implementation if > > > > > > > > > > > the long-term plan is to > > > > > > > > > > > move all steps towards this immediate reduction > > > > > > > > > > > behaviour. Thanks, Cole Greer From: Ken Hu > > > > > > > > > > > k...@bitquilltech.com.INVALIDDate: Monday, > > > > > > > > > > > September 11, 2023 at 4:16 PM To: > > > > > > > > > > > dev@tinkerpop.apache.org > dev@tinkerpop.apache.orgSubjec > > > > > > > > > > > t: > > > > > > > > > > > [DISCUSS] Ternary Boolean Handling in New Steps Hi All, > > > > > > > > > > > Starting in version 3.6, the ternary boolean system was > > > > > > > > > > > introduced to handle comparison/equality tests within > > > > > > > > > > > Gremlin. Recently, > > > > > > > > > > > I've been implementing some list functions from > > > > > > > > > > > Proposal 3 which > > > > > > > > > > > make heavy use of the GremlinValueComparator to > > > > > > > > > > > determine if values > > > > > > > > > > > satisfy a specific condition. However, I'm finding it a > > > > > > > > > > > bit tricky to > > > > > > > > > > > understand how I should handle the > > > > > > > > > > > GremlinTypeErrorException. For any() and > > > > > > > > > > > all(), it seems like it would make sense to immediately > > > > > > > > > > > reduce any ERROR state > > > > > > > > > > > to false as it's a filter step. In the case of all(), > > > > > > > > > > > if a > > > > > > > > > > > GremlinTypeErrorException is caught, it would mean > > > > > > > > > > > there was a comparison error so the > > > > > > > > > > > traverser should be removed from the stream. However, > > > > > > > > > > > doing this > > > > > > > > > > > seemingly clashes with the original intention of > > > > > > > > > > > ternary boolean which is to > > > > > > > > > > > allow a provider-specific response on how to handle an > > > > > > > > > > > ERROR state. My current thoughts are that we should > > > > > > > > > > > rework the ternary > > > > > > > > > > > boolean system in the future to make it easier to > > > > > > > > > > > incorporate it into > > > > > > > > > > > new steps. One of the trickiest parts is that it uses > > > > > > > > > > > unchecked exceptions as > > > > > > > > > > > a means to implement the ERROR state which can get > > > > > > > > > > > easily > > > > > > > > > > > missed or accidentally leaked to the user (which has > > > > > > > > > > > happened before). > > > > > > > > > > > For now, I'm planning to go ahead and immediately > > > > > > > > > > > reduce ERROR states as I > > > > > > > > > > > think that is what makes the most sense for list > > > > > > > > > > > functions. Does anyone have any thoughts about this? > > > > > > > > > > > Thanks, Ken > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >