That might be an even better option. I don't have any experience with Idris, but the syntax for data type definitions is pretty similar to Haskell's. I have a mapping already written (in Haskell) that takes schemas defined in YAML to Haskell data type definitions; I imagine I could tweak it slightly to generate Idris definitions instead, and from there we could take advantage of Idris code generation. Come to think of it, there are also quite a few codegen projects in Haskell that could be used. With Idris, however, it seems that code generation was a design consideration for the language itself.
Josh On Tue, Jan 7, 2020 at 4:05 AM Stephen Mallette <[email protected]> wrote: > Regarding code generation... > > A while ago, James Thornton put me onto Idris which is sorta what sent me > trying to learn Haskell: > > http://docs.idris-lang.org/en/latest/reference/codegen.html > > I don't really have a sense of whether or not we could use that to our > advantage. Perhaps you do Josh? > > On Mon, Jan 6, 2020 at 1:08 PM Joshua Shinavier <[email protected]> wrote: > > > Hi Pieter, Stephen, > > > > Pieter: Can it be specified in `formal` English rather than in Category > > Theory? > > Josh: Sure. CT is a mathematical framework that makes our definition of > the > > data model rigorous, but the data model can also be described in plain > > English. We tried to do both in the paper, and naturally the reference > > documentation for TinkerPop will be extended for any new APIs. You will > be > > able to get pretty far in understanding the data model just by looking at > > the code. For example, even if you don't know Haskell, you might be able > to > > tell what is going on here: > > > > data DataType > > = PrimitiveType PrimitiveType > > | NamedType TypeReference > > | ProductType > > { productFields :: [Field] } > > | SumType > > { sumCases :: [Field] } > > | EnumType > > { enumValues :: [Field] } > > | OptionalType > > { optionalType :: DataType } > > | ListType > > { elementType :: DataType } > > | SetType > > { setElementType :: DataType } > > | MapType > > { keyType :: DataType > > , valueType :: DataType } > > > > > > A data type is either a primitive type: > > > > data PrimitiveType > > = BinaryType > > | BooleanType > > | FloatType > > { floatTypePrecision :: BitPrecision } > > | IntegerType > > { integerTypePrecision :: BitPrecision > > , integerTypeSigned :: Bool } > > | StringType > > > > > > ...or it's a named ("labeled") type like "Person" or "knows", or a sum or > > product type, or one of a few other things depending on what we choose to > > support in TinkerPop. To this, we will probably add VertexType, EdgeType, > > and PropertyType. Yes, logically they are product types, but they are > > fairly special in TinkerPop, and deserve their own constructors, like the > > OptionalType and EnumType constructors you see above (optionals and enums > > being special sum types). When we get down into the actual code and > > documentation, I don't think users are going to need to worry about > > category theory. > > > > > > Pieter: "I'd prefer if the reference implementation is in fact far less > > important than the specification itself" > > Josh: I think the reason we have never had a real specification is that > > neither the property graph data model nor the operational semantics of > > Gremlin had been formalized. We're halfway there now with the formal PG > > data model. The extent to which Gremlin can be formalized for TP4 is TBD, > > though I would like to see things move things in the direction of a > monadic > > formalism as I say. The further we go in that direction, I'd say the > easier > > it will be to write a spec. > > > > W.r.t. making implementations more efficient, that's somewhat orthogonal > to > > what I'm trying to do, but at least in Scala (and Haskell if we decide to > > pursue a full implementation there) I do see a lot of the nested iterator > > messiness and other intermediate abstractions going away. > > > > Stephen: "I think the idea is more about the notion that the Structure > API > > which is a provider API is something that can go away as a concept." > > Josh: OK, yes, I can see edge and vertex implementations going away, as > > well, if the basic data access operations for outV, inV, etc. etc. are > > implemented by the provider on the process side instead. > > > > Stephen: "I think I'm interested in "practical" so Scala seems right." > > Josh: Well, now I think I might take a stab at a basic Haskell > > implementation just for the sake of prototyping in my favorite > programming > > language. May or may not become part of TinkerPop proper. > > > > Stephen: "That would be great. We currently do that for GLVs and it's > > pretty ugly and was mostly useful in the very initial bit of each new > > language ecosystem implementation as it just saved a ton of typing for > the > > creation of GraphTraversal, GraphTraversalSource and __." > > Josh: Let's see exactly what we want to generate in each target > language. I > > was thinking of generating code for basic structural classes like > vertices > > and edges, which would be easy enough to do right now just be defining a > > schema for the objects, translating that schema to Thrift IDL, generating > > code in each of the target languages, and then gutting the generated code > > to remove all Thrift-specific logic. For Java and Python, that seems to > > result in a pretty good starting point for an API. > > > > > > Josh > > > > > > On Mon, Jan 6, 2020 at 4:50 AM Stephen Mallette <[email protected]> > > wrote: > > > > > Hi Pieter - my thoughts are inline: > > > > > > > > > > Regarding the structure api and query specification. > > > > > > > > Can it be specified in `formal` English rather than in Category > Theory? > > > > I think having the specification in Category Theory simply makes the > > > > barrier to entry to high for many of us to partake in the > conversation. > > > > > > > > I get that having a formal mathematical spec is useful and > interesting > > > > but perhaps it can remain just below the surface rather than being > the > > > > primary source. > > > > > > > > > > I agree with this. I like the underpinnings and formalism that CT is > > > bringing here, but if TinkerPop becomes harder and more abstract to use > > as > > > a result I don't think we're doing anything helpful. It seems important > > > that we have some higher level language above the mathematical rigor so > > > that the average user has a shot at using this stuff. > > > > > > > > > > In TinkerPop 3 the specification was pretty much the reference > > > > implementation itself. In TinkerPop 4 I'd prefer if the reference > > > > implementation is in fact far less important than the specification > > > > itself. I.e. the specification must be in English and not refer to > api > > > > calls in the reference implementation. > > > > > > > > > > The Structure Test Suite is the worst offender there, though there are > > > aspects of the Process Test Suite that are equally bad. I'm not sure > > what a > > > test suite will look like offhand, but I think we'll need to think > harder > > > about the types of test we write to take care that they are not bound > too > > > closely to the "TinkerGraph" way of doing things. > > > > > > > > > > Regarding the implementation. > > > > > > > > Something that has always concerned me about TinkerPop's > implementation > > > > is that it (embedded java db's being the exception) is generally too > > > > far away from the data. Massive latency and endless copying of the > data > > > > occurs. > > > > > > > > > I guess Remote Graph Providers (DSG, Neptune, etc) have mitigated that > by > > > putting their implementations close to the data, thus executing the > > > traversal on the server near the data and then just returning the > > result. I > > > think that we need to keep that model in mind for TP4 as it was really > > only > > > emergent in TP3 and our designs supporting that model basically were > > > shoehorned in. > > > > > > > > > > Further it has no real understanding of memory. Any step might for > > > > whatever reason have a ReducingBarrierStep and load the full > traversal > > > > data set into the JVM's memory. > > > > > > > > > > I'm not sure that I follow what you're looking for TP to do here. If > you > > > want to outline that further, perhaps start a different thread as it > > > doesn't sound quite related to this thread on the Schema API. > > > > > > > > > > Perhaps a reference implementation written in C/C++/Go/Rust... might > be > > > > more useful to database vendors. > > > > > > > > > > All languages I don't know ;) Short of some major new contributions > from > > > someone, I'd expect us to be heading down the road of the JVM again as > > our > > > starting point. > > > > > > > > > > All that said, thanks for all the work you are putting into this. > > > > > > > > > Appreciate your thoughts. Take care. > > > > > > > > > On Sun, Jan 5, 2020 at 2:14 PM pieter martin <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > Here are some thoughts/concerns that I have. > > > > > > > > Regarding the structure api and query specification. > > > > > > > > Can it be specified in `formal` English rather than in Category > Theory? > > > > I think having the specification in Category Theory simply makes the > > > > barrier to entry to high for many of us to partake in the > conversation. > > > > > > > > I get that having a formal mathematical spec is useful and > interesting > > > > but perhaps it can remain just below the surface rather than being > the > > > > primary source. > > > > > > > > In TinkerPop 3 the specification was pretty much the reference > > > > implementation itself. In TinkerPop 4 I'd prefer if the reference > > > > implementation is in fact far less important than the specification > > > > itself. I.e. the specification must be in English and not refer to > api > > > > calls in the reference implementation. > > > > > > > > Regarding the implementation. > > > > > > > > Something that has always concerned me about TinkerPop's > implementation > > > > is that it (embedded java db's being the exception) is generally too > > > > far away from the data. Massive latency and endless copying of the > data > > > > occurs. > > > > Further it has no real understanding of memory. Any step might for > > > > whatever reason have a ReducingBarrierStep and load the full > traversal > > > > data set into the JVM's memory. > > > > Perhaps a reference implementation written in C/C++/Go/Rust... might > be > > > > more useful to database vendors. > > > > > > > > All that said, thanks for all the work you are putting into this. > > > > > > > > Cheers > > > > Pieter > > > > > > > > > > > > > > > > > > > > On Sat, 2020-01-04 at 10:51 -0800, Joshua Shinavier wrote: > > > > > Thanks for the detailed response, Stephen. Good points made. Let's > > > > > dig a > > > > > little deeper to get to a common understanding of a "structure API" > > > > > for > > > > > TP4. I agree that Graph is a relic of the Blueprints days, and > would > > > > > not be > > > > > missed. Graph.Features would then need to be renamed at the very > > > > > least. > > > > > However, Vertex, Edge, Property etc. are also part of the structure > > > > > API, > > > > > and they are fundamental. We need them in TP4, but there is also an > > > > > opportunity to generalize them slightly to give us a strong notion > of > > > > > schema. Graph.Features, whatever we call it, would not be so much a > > > > > stand-alone collection of flags describing the graph back-end, as > it > > > > > is > > > > > now, but a set of constraints on the schemas you can define. It > would > > > > > "have > > > > > teeth" because you could actually validate your schema against it, > > > > > assuming > > > > > you have chosen to define one. If we do want a handy Graph > interface > > > > > in > > > > > TP4, we could consider deriving the implementation rather than > > > > > allowing > > > > > developers to define it themselves. > > > > > > > > > > W.r.t. Haskell vs. Scala -- if you / enough of us are interested in > > > > > Haskell, we could start with a Haskell-based reference > implementation > > > > > before we proceed to Scala. The schema API I have in mind is > > > > > essentially > > > > > already written, and will be publicly available soon. It might not > be > > > > > a bad > > > > > idea to explore true monadic traversals, as I have talked about > > > > > before, in > > > > > functionally pure Haskell first. The Gremlin-Scala [1] and Greskell > > > > > [2] > > > > > projects have already dug into some of the finer details and could > be > > > > > used > > > > > for reference. To that, I would add monadic encapsulation of > > > > > transactions, > > > > > graph side-effects, and exceptions. The universality of a monadic > > > > > approach > > > > > to graph traversal might help us to address some of the language > > > > > variation > > > > > you mention, because it will be easier to describe exactly what > basic > > > > > steps > > > > > do and how their effects are composed together. Although most of > the > > > > > languages of interest for TinkerPop back-ends are not purely > > > > > functional, > > > > > you can usually create APIs that are. Formal specifications of > > > > > TinkerPop > > > > > structure and process ought to be possible. > > > > > > > > > > For project structure, I say we follow your instincts, as you are > the > > > > > most > > > > > intimately familiar with the code base(s) and the issues. I think > it > > > > > makes > > > > > sense to continue to have a master repo for reference > > > > > implementations, but > > > > > yes we might want separate build systems. That will certainly be > the > > > > > case > > > > > if we want to include a Haskell implementation alongside a JVM one. > > > > > We > > > > > might be able to make use of code generation for a one-time > > > > > translation of > > > > > core structure API into various target languages. > > > > > > > > > > To my mind, your emphasis on consistency across GLVs in TP4 goes > well > > > > > with > > > > > an emphasis on a stronger type system and better-defined > operational > > > > > semantics for traversals. > > > > > > > > > > Josh > > > > > > > > > > > > > > > [1] > > > > > https://github.com/mpollmeier/gremlin-scala > > > > > > > > > > [2] > > > > > https://github.com/debug-ito/greskell > > > > > > > > > > > > > > > > > > > > On Fri, Jan 3, 2020 at 5:21 AM Stephen Mallette < > > > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > Sorry it took me a bit to get to this... > > > > > > > > > > > > > Graph.Features will carry over into TP4 > > > > > > > > > > > > Having Graph.Features implies having Graph which is part of the > > > > > > Structure > > > > > > API. Marko and I have questioned the necessity for the Graph and > > > > > > Structure > > > > > > API in recent years. Major graph providers who use TinkerPop > don't > > > > > > even > > > > > > implement it I don't think - they just process Gremlin. This > > > > > > "secondary" > > > > > > API (formerly a first class citizen) also creates confusion for > > > > > > users who > > > > > > try to use it directly and have mixed results depending on the > > > > > > graph they > > > > > > choose. Worse still, they end up writing Structure API code in > > > > > > scripts > > > > > > embedded as strings in their code (despite advice to not do so) > and > > > > > > end up > > > > > > creating non-portable code. Furthermore, GLV users end up > > > > > > wondering why > > > > > > they can't do graph.addVertex() and other similar Structure API > > > > > > calls. > > > > > > Mixed advice in third-party blog posts compounds these issues. > > > > > > > > > > > > So, when you talk about the Structure API, I wonder if you mean > to > > > > > > keep all > > > > > > of it or just the notion of Graph.Features (in some new revised > > > > > > form). The > > > > > > latter is agreeable in my mind because we likely still need some > > > > > > way to > > > > > > know how a graph behaves for purposes of our technology test > suite. > > > > > > Without > > > > > > the Structure API, I wasn't sure yet what that would look like. > > > > > > > > > > > > > I feel we should use Scala for the API. This opinion is > informed > > > > > > > by my > > > > > > > > > > > > experiences writing tools of this kind in both Java and Haskell > at > > > > > > Uber. > > > > > > While I am a huge fan of Haskell, practical considerations rule > it > > > > > > out as > > > > > > an option. We need the API to be JVM-compatible > > > > > > > > > > > > Having followed along with your talks, writings, etc and with my > > > > > > own > > > > > > reading of Category Theory and such, I realized that a use of > Java > > > > > > would > > > > > > probably not work. While I have interest in Haskell (more so than > > > > > > Scala), > > > > > > Scala does seem like the best fit for this work on the JVM. That > > > > > > said, > > > > > > there are two points I'd like us to consider that have been on my > > > > > > mind for > > > > > > TP4: > > > > > > > > > > > > 1. The realization that TinkerPop, specifically Gremlin, would be > > > > > > available > > > > > > natively in other language ecosystems besides the JVM came way > too > > > > > > late in > > > > > > TP3. As a result, we have an extraordinarily mixed set of > messages > > > > > > with > > > > > > Gremlin usage. Things work one way in Java, but another way in > > > > > > Python. And > > > > > > while 3.4.x unified connection options across languages, there's > > > > > > still too > > > > > > many ways to connect to a graph and too much discrepancy in > > > > > > behavior. We > > > > > > need to think about how every single feature that we create for > TP4 > > > > > > behaves > > > > > > in each language and what parity of capability we can achieve > > > > > > there. And if > > > > > > some reasonable level of parity can't be achieved for whatever > > > > > > reason, we > > > > > > should seriously consider either not implementing the feature or > > > > > > the story > > > > > > for the language ecosystems that don't have the functionality > > > > > > better be > > > > > > crystal clear and consistent with TinkerPop as whole. We should > > > > > > very much > > > > > > consider how Graph.Features (in whatever form it takes) is > > > > > > accessible via > > > > > > Java, Python, Javascript, etc. before going too far in any > > > > > > particular > > > > > > development direction. > > > > > > 2. What is the general structure for this project with respect to > > > > > > the > > > > > > different language environments that we have? Personally, I still > > > > > > like the > > > > > > idea of a single repo, but without a single build system ruling > it > > > > > > all. In > > > > > > this way each language ecosystem can take advantage of the best > > > > > > parts of > > > > > > its particular build tool chain without having to shoehorn into a > > > > > > different > > > > > > system's approach. That said, I think each ecosystem should stick > > > > > > to a > > > > > > single build tool chain e.g.. maven for the JVM. > > > > > > > > > > > > As a big picture point, I think the JVM ecosystem will be the > model > > > > > > for all > > > > > > other language ecosystems. I would think that we would want to > take > > > > > > care > > > > > > that we not turn TinkerPop into a Scala-only system - I assume > this > > > > > > work > > > > > > isn't laying the foundation for that, but figured I'd voice the > > > > > > concern. I > > > > > > think we'd largely still rely on Java for development outside of > > > > > > this > > > > > > feature that has some specific demands not addressed well by it. > > > > > > I'd > > > > > > further assume that we would have some nice clean interop back to > > > > > > Java for > > > > > > this stuff so as to keep our core users well engaged. > > > > > > > > > > > > > to keep TinkerPop aligned with upcoming standards like RDF* and > > > > > > > GQL. > > > > > > > Interoperability with mm-ADT should be straightforward > > > > > > > > > > > > Thank you for keeping up with the developing standards. That's a > > > > > > nice > > > > > > service to TinkerPop. > > > > > > > > > > > > Ultimately my vision for TP4 seems to have less to do with > specific > > > > > > major > > > > > > new features (thus glad to see that you're thinking in that > manner) > > > > > > and > > > > > > more to do with creating consistent, coherent and easy graph > usage > > > > > > patterns > > > > > > across language ecosystems for users while making it even simpler > > > > > > for > > > > > > providers to build their TinkerPop-enabled systems. Having seen > so > > > > > > much > > > > > > success with GLVs for TP3, despite their drawbacks, I can't help > > > > > > but sense > > > > > > that focusing on this notion as a foundational element of design > > > > > > for TP4 > > > > > > will further expand TinkerPop's appeal and reach. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Dec 26, 2019 at 11:00 AM Joshua Shinavier < > > > > > > [email protected] > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > I would like to reboot the conversation around TinkerPop 4, > > > > > > > specifically > > > > > > > > > > > > as > > > > > > > it concerns the structure API. You will have seen my posts, > ever > > > > > > > since my > > > > > > > presentation [1] last January, about an algebraic approach to > > > > > > > property > > > > > > > graph schemas and transformations, which Ryan and I formalized > in > > > > > > > the APG > > > > > > > paper [2]. I am now very close to releasing the Haskell > > > > > > > implementation of > > > > > > > this framework as open source software (to be accompanied by an > > > > > > > Uber > > > > > > > Engineering Blog post, in the next few weeks if all goes well). > > > > > > > > > > > > > > At various times and places, I have suggested that we develop a > > > > > > > > > > > > Scala-based > > > > > > > structure API for TP4 which implements APG in an extensible > way. > > > > > > > I think > > > > > > > > > > > > it > > > > > > > is time to proceed and start committing code, or discuss > > > > > > > alternative > > > > > > > > > > > > plans > > > > > > > for the structure API. There seems to be plenty of community > > > > > > > interest, > > > > > > > > > > > > and > > > > > > > I now have an official OK to put some engineering hours towards > > > > > > > it at > > > > > > > > > > > > work. > > > > > > > I would like to align with you -- the TP PMC and other > TinkerPop > > > > > > > > > > > > committers > > > > > > > and developers -- on how to proceed, who will contribute, and > > > > > > > what the > > > > > > > development timeline will look like. > > > > > > > > > > > > > > Some specifics from my side: > > > > > > > > > > > > > > - Graph.Features will carry over into TP4; it will just be a > > > > > > > bit more > > > > > > > sophisticated than the current TP3 Graph.Features. Btw. I > also > > > > > > > > > > > > proposed > > > > > > > this idea of a graph feature vector at the recent Dagstuhl > > > > > > > Seminar > > > > > > > > > > > > [3], > > > > > > > where it caught on and will be the basis of a "dragon data > > > > > > > model" that > > > > > > > might help to keep TinkerPop aligned with upcoming standards > > > > > > > like RDF* > > > > > > > and > > > > > > > GQL. > > > > > > > - I feel we should use Scala for the API. This opinion is > > > > > > > informed by > > > > > > > > > > > > my > > > > > > > experiences writing tools of this kind in both Java and > > > > > > > Haskell at > > > > > > > > > > > > Uber. > > > > > > > While I am a huge fan of Haskell, practical considerations > > > > > > > rule it out > > > > > > > as > > > > > > > an option. We need the API to be JVM-compatible. The best > > > > > > > Haskell-JVM > > > > > > > bridge in is Eta [4], but IMO it is not ready to be put in > the > > > > > > > > > > > > critical > > > > > > > path on a project such as TinkerPop; we used it at Uber for > a > > > > > > > while > > > > > > > > > > > > and > > > > > > > found it to be a time sink, despite the generated bytecode > > > > > > > working > > > > > > > great. > > > > > > > Likewise, I would strongly advise against continuing with a > > > > > > > pure > > > > > > > Java-based > > > > > > > API if we want to do intelligent things with graph schemas. > > > > > > > The > > > > > > > language is > > > > > > > just not appropriate as a basis for the type system in > > > > > > > question. > > > > > > > > > > > > Scala, > > > > > > > on > > > > > > > the other hand, has all of the advantages of Haskell in > terms > > > > > > > of type > > > > > > > safety and functional pattern matching, although it requires > > > > > > > some > > > > > > > > > > > > extra > > > > > > > discipline to keep your code pure. > > > > > > > - Interoperability with Ryan's CQL (categorical query > language > > > > > > > [5]) is > > > > > > > of interest. > > > > > > > - Interoperability with mm-ADT should be straightforward now > > > > > > > that > > > > > > > > > > > > mm-ADT > > > > > > > has support for union types. Hopefully, mm-ADT's type system > > > > > > > will end > > > > > > > up as > > > > > > > a proper superset of TP4's. > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > Josh > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012 > > > > > > > > > > > > > [2] > > > > > > > https://arxiv.org/abs/1909.04881 > > > > > > > > > > > > > > [3] > > > > > > > https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19491 > > > > > > > > > > > > > > [4] > > > > > > > https://eta-lang.org > > > > > > > > > > > > > > [5] > > > > > > > https://www.categoricaldata.net > > > > > > > > > > > > > > > > > > > > > > > > > > > >
