Hi everyone, I just remembered we do have a well-documented set of rules regarding comparability and equality of nulls and mismatched types which is relevant to this discussion. It is part of the provider docs here: https://tinkerpop.apache.org/docs/current/dev/provider/#gremlin-semantics-equality-comparability. We may still have some work to do to consistently follow our own rules, although the consensus here seems to agree with the docs that null==null is true. I agree that this would be worth revisiting if a type system is being introduced to TinkerPop.
Josh, I look forward to hearing more on your proposed type system once it is ready. Regards, Cole From: Joshua Shinavier <j...@fortytwo.net> Date: Tuesday, August 8, 2023 at 7:23 AM To: dev@tinkerpop.apache.org <dev@tinkerpop.apache.org> Subject: Re: [DISCUSS] Is null equal to null Hi Dave, I declined to add my name to that paper. I worked closely with most of the authors for about 7 months in 2021 when I led a PGSWG subgroup on property types. We met weekly (minutes <https://docs.google.com/document/d/1-YcfzgCJ5zXzDq_lL0EfMzx-9M_DM5pGJ66hzfYLR_A/edit>) to define a type system for properties and, by extension, vertices and edges. This was a pretty interesting time, with vigorous debate about nominative vs. structural types, schema on write vs. schema on read, but there were still a lot of unresolved questions when we paused the working group. The paper was written a year later in the space of a few weeks -- I was invited to be involved, but didn't have time, and didn't approve of a couple of aspects of the paper draft -- above all, that the proposal ignored the still-open, still-relevant questions about the formalism in favor of just "getting something out there". Some of my influences made it in -- e.g. the graph data model feature matrix, which I introduced at the 2019 Dagstuhl seminar. So, it's a paper by my friends and colleagues which captures a lot of the major concerns of the working group and of schemas for property graphs -- I am just not satisfied with the actual formalism, and don't see it as definitive. My preference these days is to map graph schemas into some other, well-studied formalism like typed lambda calculi (in the case of Lambda Graph) rather than creating a type system specifically for property graphs as in that paper, or even as in APG. That gives you a lot more freedom to introduce variants of the data model (e.g. if an application needs constraints on property values like min/max, regex, etc., it is a short step from a property graph model based on System F to one with dependent types). I am also cautious of making any concessions to SQL which would weaken the type system, e.g. by including nulls in primitive types. Josh On Mon, Aug 7, 2023 at 3:09 PM David Bechberger <d...@bechberger.com> wrote: > Hello Ken, > > I don't know that I have a strong opinion on what NULL==NULL should > evaluate to, but I agree we should come up with a set of rules here for > consistency, both within Gremlin but also with other database language > standards (e.g. GQL and SQL) so that Gremlin best matches customer > expectations. Gremlin's divergence from user expectations when it comes to > null handling has been a constant headache for new users. While I agree > with Josh that a type system would make this easier, we still need to be > consistent until we cross that bridge. > > For example, if you have a list, A, which is > [1,2,null] and a list, B, which is [1,null]. Should the result of an > INTERSECT be [1,null] or [1] > > In Postgres, this would be [1,null] so that is probably what I would > recommend unless someone has a stronger opinion to do something different? > > Josh, I am familiar with the work you did on Dragon, and I am curious how > you see your work aligning with the recent SIGMOD paper [1] from the LDBC > working group on PG Schema? > > Dave > > [1] https://arxiv.org/abs/2211.10962 > > > On Sat, Aug 5, 2023 at 7:02 AM Joshua Shinavier <j...@fortytwo.net> wrote: > > > Hi Ken, > > > > Yes indeed, there is that push. I am not saying that Gremlin shouldn't > have > > a type system -- just that certain questions will have better answers > once > > it does. While I am not drawing a lot of attention to it yet in > connection > > with TinkerPop, there is a type system I am going to propose for > TinkerPop. > > The formalism is called Lambda Graph, and it is closely related to the > > Algebraic Property Graphs [1] model which was implemented by Dragon [2]. > I > > made a big deal about Dragon three years ago and then was unable to > release > > it, so I'm waiting until Hydra [3] is completely ready before promoting > it > > here. That said, it's not far from being ready. We are building property > > graph (not yet TinkerPop) applications with it at LinkedIn. I recently > gave > > a presentation [4] on the data model which has excerpts from the Lambda > > Graph paper draft. In terms of property types, probably the first thing I > > will explore is integrating Hydra's "TinkerPop" model [5] with TinkerPop > > proper. In that model, property types are parameterized and unspecified, > as > > are vertex and edge id types; different type systems for properties and > ids > > can be plugged in here. For Hydra's core type system, see hydra/core.Type > > [6]. This type system behaves as I described above: there are no "nulls", > > but there are optionals, which are comparable to the extent that the base > > type is comparable. > > > > Josh > > > > [1] https://arxiv.org/abs/1909.04881 > > [2] https://www.uber.com/blog/dragon-schema-integration-at-uber-scale/ > > [3] https://github.com/CategoricalData/hydra > > [4] > > > > > https://docs.google.com/presentation/d/1PF0K3KtopV0tMVa0sGBW2hDA7nw-cSwQm6h1AED1VSA > > [5] > > > > > https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/langs/tinkerpop/propertyGraph/package-summary.html > > [6] > > > > > https://categoricaldata.github.io/hydra/hydra-java/javadoc/hydra/core/Type.html > > > > > > On Fri, Aug 4, 2023 at 6:23 PM Ken Hu <k...@bitquilltech.com.invalid> > > wrote: > > > > > Hi Josh, > > > > > > Thanks for your input. There seems to be a push in the graph database > > world > > > towards having a schema. It's likely something like this would be > > > introduced in TinkerPop in the future. Let's assume that TinkerPop does > > > support schemas, and therefore would have a type system, would this > > change > > > your opinion on the matter? > > > > > > Thanks again, > > > Ken > > > > > > On Wed, Aug 2, 2023 at 3:54 PM Joshua Shinavier <j...@fortytwo.net> > > wrote: > > > > > > > For what it is worth, I think the question of whether null == null is > > > only > > > > meaningful in the context of a specific type system, which Gremlin so > > far > > > > does not provide. My personal preference is to avoid SQL-style nulls > > and > > > > achieve optionality through union types (e.g. Java's Optional or > > > Haskell's > > > > Maybe). In the case of two lists, if you can assume that the type of > > the > > > > list is list<optional<int>>, then you can safely treat null like > > > > Optional.empty(), and compare it with another null of the same > logical > > > type > > > > (int). If that is the interpretation of your two lists, then the > > > > intersection is [1, null]. > > > > > > > > Josh > > > > > > > > > > > > > > > > On Tue, Aug 1, 2023 at 5:47 PM Ken Hu <k...@bitquilltech.com.invalid > > > > > > wrote: > > > > > > > > > Hi All, > > > > > > > > > > As Gremlin evolves and gains more functionality, it is important > that > > > we > > > > > establish some fundamental rules to provide consistency in results. > > One > > > > > such question that we should come to agreement on is how null > values > > > are > > > > > compared. Currently, Gremlin seems to mostly follow the comparison > > that > > > > is > > > > > used in Java where NULL == NULL returns TRUE. However, in many > other > > > > > database systems, NULL == NULL would return FALSE (or NULL). > > > > > > > > > > This question comes about as I'm starting to look a little deeper > > into > > > > the > > > > > proposed list functions. An example of where this is applicable is > > the > > > > > INTERSECT list function. For example, if you have a list, A, which > is > > > > > [1,2,null] and a list, B, which is [1,null]. Should the result of > an > > > > > INTERSECT be [1,null] or [1]? > > > > > > > > > > I think it makes sense in Gremlin for us to follow the rule that > most > > > > > programming languages follow which is the former (NULL == NULL > > returns > > > > > TRUE) because it feels more in line with how Gremlin was meant to > be > > > used > > > > > (together with your code rather than as a string query). In this > case > > > the > > > > > return value would be [1,null]. > > > > > > > > > > What are your thoughts on this subject? > > > > > > > > > > Thanks, > > > > > Ken > > > > > > > > > > > > > > >