Re: structure API for TP4

Stephen Mallette Mon, 06 Jan 2020 04:11:49 -0800

>  However, Vertex, Edge, Property etc. are also part of the structure API,
and they are fundamental.


I agree...we likely still need the notion of Vertex, Edge and Property as
Gremlin will need to programmatically interact with such things.In fact we
might yet even need Graph (for subgraph use cases maybe??). I think the
idea is more about the notion that the Structure API which is a provider
API is something that can go away as a concept. Implementing TinkerPop
shouldn't require a "Structure API". End user of Gremlin however would
likely still have those constructs available to them. I'm not sure how all
that will/should work for users. We currently have the ugly and confusion
notion of "detachment" which has slowly migrating to consistency across
TinkerPop is still troubling for folks (providers and users alike). That
discussion likely needs its own thread at some point.

>  if you / enough of us are interested in Haskell, we could start with a
Haskell-based reference implementation before we proceed to Scala. The
schema API I have in mind is essentially already written, and will be
publicly available soon.

I think I'm interested in "practical" so Scala seems right. I'll be curious
how Java interop will look with that. Perhaps we need to see how the schema
API looks when it becomes public so that I can better visualize what is
being proposed.

> We might be able to make use of code generation for a one-time
translation of core structure API into various target languages.

That would be great. We currently do that for GLVs and it's pretty ugly and
was mostly useful in the very initial bit of each new language ecosystem
implementation as it just saved a ton of typing for the creation of
GraphTraversal, GraphTraversalSource and __. But, I'd say it's been less
useful in doing it's work in ongoing maintenance. I suppose it helps a
little bit with new steps, but the groovy scripts+reflection that we use to
do the code generation is not pretty and only covers those three classes
and the tokens/enums. In essence, it's kind of its own body of code to
maintain which also stinks. I really haven't researched other options but
it would be nice if we had a solution up front for making all of this easy.

On Sat, Jan 4, 2020 at 1:51 PM Joshua Shinavier <[email protected]> wrote:

> Thanks for the detailed response, Stephen. Good points made. Let's dig a
> little deeper to get to a common understanding of a "structure API" for
> TP4. I agree that Graph is a relic of the Blueprints days, and would not be
> missed. Graph.Features would then need to be renamed at the very least.
> However, Vertex, Edge, Property etc. are also part of the structure API,
> and they are fundamental. We need them in TP4, but there is also an
> opportunity to generalize them slightly to give us a strong notion of
> schema. Graph.Features, whatever we call it, would not be so much a
> stand-alone collection of flags describing the graph back-end, as it is
> now, but a set of constraints on the schemas you can define. It would "have
> teeth" because you could actually validate your schema against it, assuming
> you have chosen to define one. If we do want a handy Graph interface in
> TP4, we could consider deriving the implementation rather than allowing
> developers to define it themselves.
>
> W.r.t. Haskell vs. Scala -- if you / enough of us are interested in
> Haskell, we could start with a Haskell-based reference implementation
> before we proceed to Scala. The schema API I have in mind is essentially
> already written, and will be publicly available soon. It might not be a bad
> idea to explore true monadic traversals, as I have talked about before, in
> functionally pure Haskell first. The Gremlin-Scala [1] and Greskell [2]
> projects have already dug into some of the finer details and could be used
> for reference. To that, I would add monadic encapsulation of transactions,
> graph side-effects, and exceptions. The universality of a monadic approach
> to graph traversal might help us to address some of the language variation
> you mention, because it will be easier to describe exactly what basic steps
> do and how their effects are composed together. Although most of the
> languages of interest for TinkerPop back-ends are not purely functional,
> you can usually create APIs that are. Formal specifications of TinkerPop
> structure and process ought to be possible.
>
> For project structure, I say we follow your instincts, as you are the most
> intimately familiar with the code base(s) and the issues. I think it makes
> sense to continue to have a master repo for reference implementations, but
> yes we might want separate build systems. That will certainly be the case
> if we want to include a Haskell implementation alongside a JVM one. We
> might be able to make use of code generation for a one-time translation of
> core structure API into various target languages.
>
> To my mind, your emphasis on consistency across GLVs in TP4 goes well with
> an emphasis on a stronger type system and better-defined operational
> semantics for traversals.
>
> Josh
>
>
> [1] https://github.com/mpollmeier/gremlin-scala
> [2] https://github.com/debug-ito/greskell
>
>
> On Fri, Jan 3, 2020 at 5:21 AM Stephen Mallette <[email protected]>
> wrote:
>
> > Sorry it took me a bit to get to this...
> >
> > > Graph.Features will carry over into TP4
> >
> > Having Graph.Features implies having Graph which is part of the Structure
> > API. Marko and I have questioned the necessity for the Graph and
> Structure
> > API in recent years. Major graph providers who use TinkerPop don't even
> > implement it I don't think - they just process Gremlin. This "secondary"
> > API (formerly a first class citizen) also creates confusion for users who
> > try to use it directly and have mixed results depending on the graph they
> > choose. Worse still, they end up writing Structure API code in scripts
> > embedded as strings in their code (despite advice to not do so) and end
> up
> > creating  non-portable code. Furthermore, GLV users end up wondering why
> > they can't do graph.addVertex() and other similar Structure API calls.
> > Mixed advice in third-party blog posts compounds these issues.
> >
> > So, when you talk about the Structure API, I wonder if you mean to keep
> all
> > of it or just the notion of Graph.Features (in some new revised form).
> The
> > latter is agreeable in my mind because we likely still need some way to
> > know how a graph behaves for purposes of our technology test suite.
> Without
> > the Structure API, I wasn't sure yet what that would look like.
> >
> > > I feel we should use Scala for the API. This opinion is informed by my
> > experiences writing tools of this kind in both Java and Haskell at Uber.
> > While I am a huge fan of Haskell, practical considerations rule it out as
> > an option. We need the API to be JVM-compatible
> >
> > Having followed along with your talks, writings, etc and with my own
> > reading of Category Theory and such, I realized that a use of Java would
> > probably not work. While I have interest in Haskell (more so than Scala),
> > Scala does seem like the best fit for this work on the JVM. That said,
> > there are two points I'd like us to consider that have been on my mind
> for
> > TP4:
> >
> > 1. The realization that TinkerPop, specifically Gremlin, would be
> available
> > natively in other language ecosystems besides the JVM came way too late
> in
> > TP3. As a result, we have an extraordinarily mixed set of messages with
> > Gremlin usage. Things work one way in Java, but another way in Python.
> And
> > while 3.4.x unified connection options across languages, there's still
> too
> > many ways to connect to a graph and too much discrepancy in behavior. We
> > need to think about how every single feature that we create for TP4
> behaves
> > in each language and what parity of capability we can achieve there. And
> if
> > some reasonable level of parity can't be achieved for whatever reason, we
> > should seriously consider either not implementing the feature or the
> story
> > for the language ecosystems that don't have the functionality better be
> > crystal clear and consistent with TinkerPop as whole. We should very much
> > consider how Graph.Features (in whatever form it takes) is accessible via
> > Java, Python, Javascript, etc. before going too far in any particular
> > development direction.
> > 2. What is the general structure for this project with respect to the
> > different language environments that we have? Personally, I still like
> the
> > idea of a single repo, but without a single build system ruling it all.
> In
> > this way each language ecosystem can take advantage of the best parts of
> > its particular build tool chain without having to shoehorn into a
> different
> > system's approach. That said, I think each ecosystem should stick to a
> > single build tool chain e.g.. maven for the JVM.
> >
> > As a big picture point, I think the JVM ecosystem will be the model for
> all
> > other language ecosystems. I would think that we would want to take care
> > that we not turn TinkerPop into a Scala-only system - I assume this work
> > isn't laying the foundation for that, but figured I'd voice the concern.
> I
> > think we'd largely still rely on Java for development outside of this
> > feature that has some specific demands not addressed well by it. I'd
> > further assume that we would have some nice clean interop back to Java
> for
> > this stuff so as to keep our core users well engaged.
> >
> > > to keep TinkerPop aligned with upcoming standards like RDF* and GQL.
> > > Interoperability with mm-ADT should be straightforward
> >
> > Thank you for keeping up with the developing standards. That's a nice
> > service to TinkerPop.
> >
> > Ultimately my vision for TP4 seems to have less to do with specific major
> > new features (thus glad to see that you're thinking in that manner) and
> > more to do with creating consistent, coherent and easy graph usage
> patterns
> > across language ecosystems for users while making it even simpler for
> > providers to build their TinkerPop-enabled systems. Having seen so much
> > success with GLVs for TP3, despite their drawbacks, I can't help but
> sense
> > that focusing on this notion as a foundational element of design for TP4
> > will further expand TinkerPop's appeal and reach.
> >
> >
> >
> >
> >
> > On Thu, Dec 26, 2019 at 11:00 AM Joshua Shinavier <[email protected]>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I would like to reboot the conversation around TinkerPop 4,
> specifically
> > as
> > > it concerns the structure API. You will have seen my posts, ever since
> my
> > > presentation [1] last January, about an algebraic approach to property
> > > graph schemas and transformations, which Ryan and I formalized in the
> APG
> > > paper [2]. I am now very close to releasing the Haskell implementation
> of
> > > this framework as open source software (to be accompanied by an Uber
> > > Engineering Blog post, in the next few weeks if all goes well).
> > >
> > > At various times and places, I have suggested that we develop a
> > Scala-based
> > > structure API for TP4 which implements APG in an extensible way. I
> think
> > it
> > > is time to proceed and start committing code, or discuss alternative
> > plans
> > > for the structure API. There seems to be plenty of community interest,
> > and
> > > I now have an official OK to put some engineering hours towards it at
> > work.
> > > I would like to align with you -- the TP PMC and other TinkerPop
> > committers
> > > and developers -- on how to proceed, who will contribute, and what the
> > > development timeline will look like.
> > >
> > > Some specifics from my side:
> > >
> > >    - Graph.Features will carry over into TP4; it will just be a bit
> more
> > >    sophisticated than the current TP3 Graph.Features. Btw. I also
> > proposed
> > >    this idea of a graph feature vector at the recent Dagstuhl Seminar
> > [3],
> > >    where it caught on and will be the basis of a "dragon data model"
> that
> > >    might help to keep TinkerPop aligned with upcoming standards like
> RDF*
> > > and
> > >    GQL.
> > >    - I feel we should use Scala for the API. This opinion is informed
> by
> > my
> > >    experiences writing tools of this kind in both Java and Haskell at
> > Uber.
> > >    While I am a huge fan of Haskell, practical considerations rule it
> out
> > > as
> > >    an option. We need the API to be JVM-compatible. The best
> Haskell-JVM
> > >    bridge in is Eta [4], but IMO it is not ready to be put in the
> > critical
> > >    path on a project such as TinkerPop; we used it at Uber for a while
> > and
> > >    found it to be a time sink, despite the generated bytecode working
> > > great.
> > >    Likewise, I would strongly advise against continuing with a pure
> > > Java-based
> > >    API if we want to do intelligent things with graph schemas. The
> > > language is
> > >    just not appropriate as a basis for the type system in question.
> > Scala,
> > > on
> > >    the other hand, has all of the advantages of Haskell in terms of
> type
> > >    safety and functional pattern matching, although it requires some
> > extra
> > >    discipline to keep your code pure.
> > >    - Interoperability with Ryan's CQL (categorical query language [5])
> is
> > >    of interest.
> > >    - Interoperability with mm-ADT should be straightforward now that
> > mm-ADT
> > >    has support for union types. Hopefully, mm-ADT's type system will
> end
> > > up as
> > >    a proper superset of TP4's.
> > >
> > > Thoughts?
> > >
> > > Josh
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012
> > > [2] https://arxiv.org/abs/1909.04881
> > > [3] https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19491
> > > [4] https://eta-lang.org
> > > [5] https://www.categoricaldata.net
> > >
> >
>

Re: structure API for TP4

Reply via email to