Re: structure API for TP4

Stephen Mallette Tue, 07 Jan 2020 04:06:04 -0800

Regarding code generation...

A while ago, James Thornton put me onto Idris which is sorta what sent me
trying to learn Haskell:


http://docs.idris-lang.org/en/latest/reference/codegen.html

I don't really have a sense of whether or not we could use that to our
advantage. Perhaps you do Josh?

On Mon, Jan 6, 2020 at 1:08 PM Joshua Shinavier <[email protected]> wrote:

> Hi Pieter, Stephen,
>
> Pieter: Can it be specified in `formal` English rather than in Category
> Theory?
> Josh: Sure. CT is a mathematical framework that makes our definition of the
> data model rigorous, but the data model can also be described in plain
> English. We tried to do both in the paper, and naturally the reference
> documentation for TinkerPop will be extended for any new APIs. You will be
> able to get pretty far in understanding the data model just by looking at
> the code. For example, even if you don't know Haskell, you might be able to
> tell what is going on here:
>
> data DataType
>   = PrimitiveType PrimitiveType
>   | NamedType TypeReference
>   | ProductType
>       { productFields  :: [Field] }
>   | SumType
>       { sumCases       :: [Field] }
>   | EnumType
>       { enumValues     :: [Field] }
>   | OptionalType
>       { optionalType   :: DataType }
>   | ListType
>       { elementType    :: DataType }
>   | SetType
>       { setElementType :: DataType }
>   | MapType
>       { keyType        :: DataType
>       , valueType      :: DataType }
>
>
> A data type is either a primitive type:
>
> data PrimitiveType
>   = BinaryType
>   | BooleanType
>   | FloatType
>     { floatTypePrecision   :: BitPrecision }
>   | IntegerType
>     { integerTypePrecision :: BitPrecision
>     , integerTypeSigned    :: Bool }
>   | StringType
>
>
> ...or it's a named ("labeled") type like "Person" or "knows", or a sum or
> product type, or one of a few other things depending on what we choose to
> support in TinkerPop. To this, we will probably add VertexType, EdgeType,
> and PropertyType. Yes, logically they are product types, but they are
> fairly special in TinkerPop, and deserve their own constructors, like the
> OptionalType and EnumType constructors you see above (optionals and enums
> being special sum types). When we get down into the actual code and
> documentation, I don't think users are going to need to worry about
> category theory.
>
>
> Pieter: "I'd prefer if the reference implementation is in fact far less
> important than the specification itself"
> Josh: I think the reason we have never had a real specification is that
> neither the property graph data model nor the operational semantics of
> Gremlin had been formalized. We're halfway there now with the formal PG
> data model. The extent to which Gremlin can be formalized for TP4 is TBD,
> though I would like to see things move things in the direction of a monadic
> formalism as I say. The further we go in that direction, I'd say the easier
> it will be to write a spec.
>
> W.r.t. making implementations more efficient, that's somewhat orthogonal to
> what I'm trying to do, but at least in Scala (and Haskell if we decide to
> pursue a full implementation there) I do see a lot of the nested iterator
> messiness and other intermediate abstractions going away.
>
> Stephen: "I think the idea is more about the notion that the Structure API
> which is a provider API is something that can go away as a concept."
> Josh: OK, yes, I can see edge and vertex implementations going away, as
> well, if the basic data access operations for outV, inV, etc. etc. are
> implemented by the provider on the process side instead.
>
> Stephen: "I think I'm interested in "practical" so Scala seems right."
> Josh: Well, now I think I might take a stab at a basic Haskell
> implementation just for the sake of prototyping in my favorite programming
> language. May or may not become part of TinkerPop proper.
>
> Stephen: "That would be great. We currently do that for GLVs and it's
> pretty ugly and was mostly useful in the very initial bit of each new
> language ecosystem implementation as it just saved a ton of typing for the
> creation of GraphTraversal, GraphTraversalSource and __."
> Josh: Let's see exactly what we want to generate in each target language. I
> was thinking of generating code for basic structural classes like vertices
> and edges, which would be easy enough to do right now just be defining a
> schema for the objects, translating that schema to Thrift IDL, generating
> code in each of the target languages, and then gutting the generated code
> to remove all Thrift-specific logic. For Java and Python, that seems to
> result in a pretty good starting point for an API.
>
>
> Josh
>
>
> On Mon, Jan 6, 2020 at 4:50 AM Stephen Mallette <[email protected]>
> wrote:
>
> > Hi Pieter - my thoughts are inline:
> >
> >
> > > Regarding the structure api and query specification.
> > >
> > > Can it be specified in `formal` English rather than in Category Theory?
> > > I think having the specification in Category Theory simply makes the
> > > barrier to entry to high for many of us to partake in the conversation.
> > >
> > > I get that having a formal mathematical spec is useful and interesting
> > > but perhaps it can remain just below the surface rather than being the
> > > primary source.
> > >
> >
> > I agree with this. I like the underpinnings and formalism that CT is
> > bringing here, but if TinkerPop becomes harder and more abstract to use
> as
> > a result I don't think we're doing anything helpful. It seems important
> > that we have some higher level language above the mathematical rigor so
> > that the average user has a shot at using this stuff.
> >
> >
> > > In TinkerPop 3 the specification was pretty much the reference
> > > implementation itself. In TinkerPop 4 I'd prefer if the reference
> > > implementation is in fact far less important than the specification
> > > itself. I.e. the specification must be in English and not refer to api
> > > calls in the reference implementation.
> > >
> >
> > The Structure Test Suite is the worst offender there, though there are
> > aspects of the Process Test Suite that are equally bad. I'm not sure
> what a
> > test suite will look like offhand, but I think we'll need to think harder
> > about the types of test we write to take care that they are not bound too
> > closely to the "TinkerGraph" way of doing things.
> >
> >
> > > Regarding the implementation.
> > >
> > > Something that has always concerned me about TinkerPop's implementation
> > > is that it (embedded java db's being the exception) is generally too
> > > far away from the data. Massive latency and endless copying of the data
> > > occurs.
> >
> >
> > I guess Remote Graph Providers (DSG, Neptune, etc) have mitigated that by
> > putting their implementations close to the data, thus executing the
> > traversal on the server near the data and then just returning the
> result. I
> > think that we need to keep that model in mind for TP4 as it was really
> only
> > emergent in TP3 and our designs supporting that model basically were
> > shoehorned in.
> >
> >
> > > Further it has no real understanding of memory. Any step might for
> > > whatever reason have a ReducingBarrierStep and load the full traversal
> > > data set into the JVM's memory.
> > >
> >
> > I'm not sure that I follow what you're looking for TP to do here. If you
> > want to outline that further, perhaps start a different thread as it
> > doesn't sound quite related to this thread on the Schema API.
> >
> >
> > > Perhaps a reference implementation written in C/C++/Go/Rust... might be
> > > more useful to database vendors.
> > >
> >
> > All languages I don't know ;) Short of some major new contributions from
> > someone, I'd expect us to be heading down the road of the JVM again as
> our
> > starting point.
> >
> >
> > > All that said, thanks for all the work you are putting into this.
> >
> >
> > Appreciate your thoughts. Take care.
> >
> >
> > On Sun, Jan 5, 2020 at 2:14 PM pieter martin <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Here are some thoughts/concerns that I have.
> > >
> > > Regarding the structure api and query specification.
> > >
> > > Can it be specified in `formal` English rather than in Category Theory?
> > > I think having the specification in Category Theory simply makes the
> > > barrier to entry to high for many of us to partake in the conversation.
> > >
> > > I get that having a formal mathematical spec is useful and interesting
> > > but perhaps it can remain just below the surface rather than being the
> > > primary source.
> > >
> > > In TinkerPop 3 the specification was pretty much the reference
> > > implementation itself. In TinkerPop 4 I'd prefer if the reference
> > > implementation is in fact far less important than the specification
> > > itself. I.e. the specification must be in English and not refer to api
> > > calls in the reference implementation.
> > >
> > > Regarding the implementation.
> > >
> > > Something that has always concerned me about TinkerPop's implementation
> > > is that it (embedded java db's being the exception) is generally too
> > > far away from the data. Massive latency and endless copying of the data
> > > occurs.
> > > Further it has no real understanding of memory. Any step might for
> > > whatever reason have a ReducingBarrierStep and load the full traversal
> > > data set into the JVM's memory.
> > > Perhaps a reference implementation written in C/C++/Go/Rust... might be
> > > more useful to database vendors.
> > >
> > > All that said, thanks for all the work you are putting into this.
> > >
> > > Cheers
> > > Pieter
> > >
> > >
> > >
> > >
> > > On Sat, 2020-01-04 at 10:51 -0800, Joshua Shinavier wrote:
> > > > Thanks for the detailed response, Stephen. Good points made. Let's
> > > > dig a
> > > > little deeper to get to a common understanding of a "structure API"
> > > > for
> > > > TP4. I agree that Graph is a relic of the Blueprints days, and would
> > > > not be
> > > > missed. Graph.Features would then need to be renamed at the very
> > > > least.
> > > > However, Vertex, Edge, Property etc. are also part of the structure
> > > > API,
> > > > and they are fundamental. We need them in TP4, but there is also an
> > > > opportunity to generalize them slightly to give us a strong notion of
> > > > schema. Graph.Features, whatever we call it, would not be so much a
> > > > stand-alone collection of flags describing the graph back-end, as it
> > > > is
> > > > now, but a set of constraints on the schemas you can define. It would
> > > > "have
> > > > teeth" because you could actually validate your schema against it,
> > > > assuming
> > > > you have chosen to define one. If we do want a handy Graph interface
> > > > in
> > > > TP4, we could consider deriving the implementation rather than
> > > > allowing
> > > > developers to define it themselves.
> > > >
> > > > W.r.t. Haskell vs. Scala -- if you / enough of us are interested in
> > > > Haskell, we could start with a Haskell-based reference implementation
> > > > before we proceed to Scala. The schema API I have in mind is
> > > > essentially
> > > > already written, and will be publicly available soon. It might not be
> > > > a bad
> > > > idea to explore true monadic traversals, as I have talked about
> > > > before, in
> > > > functionally pure Haskell first. The Gremlin-Scala [1] and Greskell
> > > > [2]
> > > > projects have already dug into some of the finer details and could be
> > > > used
> > > > for reference. To that, I would add monadic encapsulation of
> > > > transactions,
> > > > graph side-effects, and exceptions. The universality of a monadic
> > > > approach
> > > > to graph traversal might help us to address some of the language
> > > > variation
> > > > you mention, because it will be easier to describe exactly what basic
> > > > steps
> > > > do and how their effects are composed together. Although most of the
> > > > languages of interest for TinkerPop back-ends are not purely
> > > > functional,
> > > > you can usually create APIs that are. Formal specifications of
> > > > TinkerPop
> > > > structure and process ought to be possible.
> > > >
> > > > For project structure, I say we follow your instincts, as you are the
> > > > most
> > > > intimately familiar with the code base(s) and the issues. I think it
> > > > makes
> > > > sense to continue to have a master repo for reference
> > > > implementations, but
> > > > yes we might want separate build systems. That will certainly be the
> > > > case
> > > > if we want to include a Haskell implementation alongside a JVM one.
> > > > We
> > > > might be able to make use of code generation for a one-time
> > > > translation of
> > > > core structure API into various target languages.
> > > >
> > > > To my mind, your emphasis on consistency across GLVs in TP4 goes well
> > > > with
> > > > an emphasis on a stronger type system and better-defined operational
> > > > semantics for traversals.
> > > >
> > > > Josh
> > > >
> > > >
> > > > [1]
> > > > https://github.com/mpollmeier/gremlin-scala
> > > >
> > > > [2]
> > > > https://github.com/debug-ito/greskell
> > > >
> > > >
> > > >
> > > > On Fri, Jan 3, 2020 at 5:21 AM Stephen Mallette <
> > > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Sorry it took me a bit to get to this...
> > > > >
> > > > > > Graph.Features will carry over into TP4
> > > > >
> > > > > Having Graph.Features implies having Graph which is part of the
> > > > > Structure
> > > > > API. Marko and I have questioned the necessity for the Graph and
> > > > > Structure
> > > > > API in recent years. Major graph providers who use TinkerPop don't
> > > > > even
> > > > > implement it I don't think - they just process Gremlin. This
> > > > > "secondary"
> > > > > API (formerly a first class citizen) also creates confusion for
> > > > > users who
> > > > > try to use it directly and have mixed results depending on the
> > > > > graph they
> > > > > choose. Worse still, they end up writing Structure API code in
> > > > > scripts
> > > > > embedded as strings in their code (despite advice to not do so) and
> > > > > end up
> > > > > creating  non-portable code. Furthermore, GLV users end up
> > > > > wondering why
> > > > > they can't do graph.addVertex() and other similar Structure API
> > > > > calls.
> > > > > Mixed advice in third-party blog posts compounds these issues.
> > > > >
> > > > > So, when you talk about the Structure API, I wonder if you mean to
> > > > > keep all
> > > > > of it or just the notion of Graph.Features (in some new revised
> > > > > form). The
> > > > > latter is agreeable in my mind because we likely still need some
> > > > > way to
> > > > > know how a graph behaves for purposes of our technology test suite.
> > > > > Without
> > > > > the Structure API, I wasn't sure yet what that would look like.
> > > > >
> > > > > > I feel we should use Scala for the API. This opinion is informed
> > > > > > by my
> > > > >
> > > > > experiences writing tools of this kind in both Java and Haskell at
> > > > > Uber.
> > > > > While I am a huge fan of Haskell, practical considerations rule it
> > > > > out as
> > > > > an option. We need the API to be JVM-compatible
> > > > >
> > > > > Having followed along with your talks, writings, etc and with my
> > > > > own
> > > > > reading of Category Theory and such, I realized that a use of Java
> > > > > would
> > > > > probably not work. While I have interest in Haskell (more so than
> > > > > Scala),
> > > > > Scala does seem like the best fit for this work on the JVM. That
> > > > > said,
> > > > > there are two points I'd like us to consider that have been on my
> > > > > mind for
> > > > > TP4:
> > > > >
> > > > > 1. The realization that TinkerPop, specifically Gremlin, would be
> > > > > available
> > > > > natively in other language ecosystems besides the JVM came way too
> > > > > late in
> > > > > TP3. As a result, we have an extraordinarily mixed set of messages
> > > > > with
> > > > > Gremlin usage. Things work one way in Java, but another way in
> > > > > Python. And
> > > > > while 3.4.x unified connection options across languages, there's
> > > > > still too
> > > > > many ways to connect to a graph and too much discrepancy in
> > > > > behavior. We
> > > > > need to think about how every single feature that we create for TP4
> > > > > behaves
> > > > > in each language and what parity of capability we can achieve
> > > > > there. And if
> > > > > some reasonable level of parity can't be achieved for whatever
> > > > > reason, we
> > > > > should seriously consider either not implementing the feature or
> > > > > the story
> > > > > for the language ecosystems that don't have the functionality
> > > > > better be
> > > > > crystal clear and consistent with TinkerPop as whole. We should
> > > > > very much
> > > > > consider how Graph.Features (in whatever form it takes) is
> > > > > accessible via
> > > > > Java, Python, Javascript, etc. before going too far in any
> > > > > particular
> > > > > development direction.
> > > > > 2. What is the general structure for this project with respect to
> > > > > the
> > > > > different language environments that we have? Personally, I still
> > > > > like the
> > > > > idea of a single repo, but without a single build system ruling it
> > > > > all. In
> > > > > this way each language ecosystem can take advantage of the best
> > > > > parts of
> > > > > its particular build tool chain without having to shoehorn into a
> > > > > different
> > > > > system's approach. That said, I think each ecosystem should stick
> > > > > to a
> > > > > single build tool chain e.g.. maven for the JVM.
> > > > >
> > > > > As a big picture point, I think the JVM ecosystem will be the model
> > > > > for all
> > > > > other language ecosystems. I would think that we would want to take
> > > > > care
> > > > > that we not turn TinkerPop into a Scala-only system - I assume this
> > > > > work
> > > > > isn't laying the foundation for that, but figured I'd voice the
> > > > > concern. I
> > > > > think we'd largely still rely on Java for development outside of
> > > > > this
> > > > > feature that has some specific demands not addressed well by it.
> > > > > I'd
> > > > > further assume that we would have some nice clean interop back to
> > > > > Java for
> > > > > this stuff so as to keep our core users well engaged.
> > > > >
> > > > > > to keep TinkerPop aligned with upcoming standards like RDF* and
> > > > > > GQL.
> > > > > > Interoperability with mm-ADT should be straightforward
> > > > >
> > > > > Thank you for keeping up with the developing standards. That's a
> > > > > nice
> > > > > service to TinkerPop.
> > > > >
> > > > > Ultimately my vision for TP4 seems to have less to do with specific
> > > > > major
> > > > > new features (thus glad to see that you're thinking in that manner)
> > > > > and
> > > > > more to do with creating consistent, coherent and easy graph usage
> > > > > patterns
> > > > > across language ecosystems for users while making it even simpler
> > > > > for
> > > > > providers to build their TinkerPop-enabled systems. Having seen so
> > > > > much
> > > > > success with GLVs for TP3, despite their drawbacks, I can't help
> > > > > but sense
> > > > > that focusing on this notion as a foundational element of design
> > > > > for TP4
> > > > > will further expand TinkerPop's appeal and reach.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 26, 2019 at 11:00 AM Joshua Shinavier <
> > > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I would like to reboot the conversation around TinkerPop 4,
> > > > > > specifically
> > > > >
> > > > > as
> > > > > > it concerns the structure API. You will have seen my posts, ever
> > > > > > since my
> > > > > > presentation [1] last January, about an algebraic approach to
> > > > > > property
> > > > > > graph schemas and transformations, which Ryan and I formalized in
> > > > > > the APG
> > > > > > paper [2]. I am now very close to releasing the Haskell
> > > > > > implementation of
> > > > > > this framework as open source software (to be accompanied by an
> > > > > > Uber
> > > > > > Engineering Blog post, in the next few weeks if all goes well).
> > > > > >
> > > > > > At various times and places, I have suggested that we develop a
> > > > >
> > > > > Scala-based
> > > > > > structure API for TP4 which implements APG in an extensible way.
> > > > > > I think
> > > > >
> > > > > it
> > > > > > is time to proceed and start committing code, or discuss
> > > > > > alternative
> > > > >
> > > > > plans
> > > > > > for the structure API. There seems to be plenty of community
> > > > > > interest,
> > > > >
> > > > > and
> > > > > > I now have an official OK to put some engineering hours towards
> > > > > > it at
> > > > >
> > > > > work.
> > > > > > I would like to align with you -- the TP PMC and other TinkerPop
> > > > >
> > > > > committers
> > > > > > and developers -- on how to proceed, who will contribute, and
> > > > > > what the
> > > > > > development timeline will look like.
> > > > > >
> > > > > > Some specifics from my side:
> > > > > >
> > > > > >    - Graph.Features will carry over into TP4; it will just be a
> > > > > > bit more
> > > > > >    sophisticated than the current TP3 Graph.Features. Btw. I also
> > > > >
> > > > > proposed
> > > > > >    this idea of a graph feature vector at the recent Dagstuhl
> > > > > > Seminar
> > > > >
> > > > > [3],
> > > > > >    where it caught on and will be the basis of a "dragon data
> > > > > > model" that
> > > > > >    might help to keep TinkerPop aligned with upcoming standards
> > > > > > like RDF*
> > > > > > and
> > > > > >    GQL.
> > > > > >    - I feel we should use Scala for the API. This opinion is
> > > > > > informed by
> > > > >
> > > > > my
> > > > > >    experiences writing tools of this kind in both Java and
> > > > > > Haskell at
> > > > >
> > > > > Uber.
> > > > > >    While I am a huge fan of Haskell, practical considerations
> > > > > > rule it out
> > > > > > as
> > > > > >    an option. We need the API to be JVM-compatible. The best
> > > > > > Haskell-JVM
> > > > > >    bridge in is Eta [4], but IMO it is not ready to be put in the
> > > > >
> > > > > critical
> > > > > >    path on a project such as TinkerPop; we used it at Uber for a
> > > > > > while
> > > > >
> > > > > and
> > > > > >    found it to be a time sink, despite the generated bytecode
> > > > > > working
> > > > > > great.
> > > > > >    Likewise, I would strongly advise against continuing with a
> > > > > > pure
> > > > > > Java-based
> > > > > >    API if we want to do intelligent things with graph schemas.
> > > > > > The
> > > > > > language is
> > > > > >    just not appropriate as a basis for the type system in
> > > > > > question.
> > > > >
> > > > > Scala,
> > > > > > on
> > > > > >    the other hand, has all of the advantages of Haskell in terms
> > > > > > of type
> > > > > >    safety and functional pattern matching, although it requires
> > > > > > some
> > > > >
> > > > > extra
> > > > > >    discipline to keep your code pure.
> > > > > >    - Interoperability with Ryan's CQL (categorical query language
> > > > > > [5]) is
> > > > > >    of interest.
> > > > > >    - Interoperability with mm-ADT should be straightforward now
> > > > > > that
> > > > >
> > > > > mm-ADT
> > > > > >    has support for union types. Hopefully, mm-ADT's type system
> > > > > > will end
> > > > > > up as
> > > > > >    a proper superset of TP4's.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > Josh
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > > >
> > >
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012
> > > > >
> > > > > > [2]
> > > > > > https://arxiv.org/abs/1909.04881
> > > > > >
> > > > > > [3]
> > > > > > https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19491
> > > > > >
> > > > > > [4]
> > > > > > https://eta-lang.org
> > > > > >
> > > > > > [5]
> > > > > > https://www.categoricaldata.net
> > > > > >
> > > > > >
> > >
> > >
> >
>

Re: structure API for TP4

Reply via email to