That might be an even better option. I don't have any experience with
Idris, but the syntax for data type definitions is pretty similar to
Haskell's. I have a mapping already written (in Haskell) that takes schemas
defined in YAML to Haskell data type definitions; I imagine I could tweak
it slightly to generate Idris definitions instead, and from there we could
take advantage of Idris code generation. Come to think of it, there are
also quite a few codegen projects in Haskell that could be used. With
Idris, however, it seems that code generation was a design consideration
for the language itself.

Josh



On Tue, Jan 7, 2020 at 4:05 AM Stephen Mallette <[email protected]>
wrote:

> Regarding code generation...
>
> A while ago, James Thornton put me onto Idris which is sorta what sent me
> trying to learn Haskell:
>
> http://docs.idris-lang.org/en/latest/reference/codegen.html
>
> I don't really have a sense of whether or not we could use that to our
> advantage. Perhaps you do Josh?
>
> On Mon, Jan 6, 2020 at 1:08 PM Joshua Shinavier <[email protected]> wrote:
>
> > Hi Pieter, Stephen,
> >
> > Pieter: Can it be specified in `formal` English rather than in Category
> > Theory?
> > Josh: Sure. CT is a mathematical framework that makes our definition of
> the
> > data model rigorous, but the data model can also be described in plain
> > English. We tried to do both in the paper, and naturally the reference
> > documentation for TinkerPop will be extended for any new APIs. You will
> be
> > able to get pretty far in understanding the data model just by looking at
> > the code. For example, even if you don't know Haskell, you might be able
> to
> > tell what is going on here:
> >
> > data DataType
> >   = PrimitiveType PrimitiveType
> >   | NamedType TypeReference
> >   | ProductType
> >       { productFields  :: [Field] }
> >   | SumType
> >       { sumCases       :: [Field] }
> >   | EnumType
> >       { enumValues     :: [Field] }
> >   | OptionalType
> >       { optionalType   :: DataType }
> >   | ListType
> >       { elementType    :: DataType }
> >   | SetType
> >       { setElementType :: DataType }
> >   | MapType
> >       { keyType        :: DataType
> >       , valueType      :: DataType }
> >
> >
> > A data type is either a primitive type:
> >
> > data PrimitiveType
> >   = BinaryType
> >   | BooleanType
> >   | FloatType
> >     { floatTypePrecision   :: BitPrecision }
> >   | IntegerType
> >     { integerTypePrecision :: BitPrecision
> >     , integerTypeSigned    :: Bool }
> >   | StringType
> >
> >
> > ...or it's a named ("labeled") type like "Person" or "knows", or a sum or
> > product type, or one of a few other things depending on what we choose to
> > support in TinkerPop. To this, we will probably add VertexType, EdgeType,
> > and PropertyType. Yes, logically they are product types, but they are
> > fairly special in TinkerPop, and deserve their own constructors, like the
> > OptionalType and EnumType constructors you see above (optionals and enums
> > being special sum types). When we get down into the actual code and
> > documentation, I don't think users are going to need to worry about
> > category theory.
> >
> >
> > Pieter: "I'd prefer if the reference implementation is in fact far less
> > important than the specification itself"
> > Josh: I think the reason we have never had a real specification is that
> > neither the property graph data model nor the operational semantics of
> > Gremlin had been formalized. We're halfway there now with the formal PG
> > data model. The extent to which Gremlin can be formalized for TP4 is TBD,
> > though I would like to see things move things in the direction of a
> monadic
> > formalism as I say. The further we go in that direction, I'd say the
> easier
> > it will be to write a spec.
> >
> > W.r.t. making implementations more efficient, that's somewhat orthogonal
> to
> > what I'm trying to do, but at least in Scala (and Haskell if we decide to
> > pursue a full implementation there) I do see a lot of the nested iterator
> > messiness and other intermediate abstractions going away.
> >
> > Stephen: "I think the idea is more about the notion that the Structure
> API
> > which is a provider API is something that can go away as a concept."
> > Josh: OK, yes, I can see edge and vertex implementations going away, as
> > well, if the basic data access operations for outV, inV, etc. etc. are
> > implemented by the provider on the process side instead.
> >
> > Stephen: "I think I'm interested in "practical" so Scala seems right."
> > Josh: Well, now I think I might take a stab at a basic Haskell
> > implementation just for the sake of prototyping in my favorite
> programming
> > language. May or may not become part of TinkerPop proper.
> >
> > Stephen: "That would be great. We currently do that for GLVs and it's
> > pretty ugly and was mostly useful in the very initial bit of each new
> > language ecosystem implementation as it just saved a ton of typing for
> the
> > creation of GraphTraversal, GraphTraversalSource and __."
> > Josh: Let's see exactly what we want to generate in each target
> language. I
> > was thinking of generating code for basic structural classes like
> vertices
> > and edges, which would be easy enough to do right now just be defining a
> > schema for the objects, translating that schema to Thrift IDL, generating
> > code in each of the target languages, and then gutting the generated code
> > to remove all Thrift-specific logic. For Java and Python, that seems to
> > result in a pretty good starting point for an API.
> >
> >
> > Josh
> >
> >
> > On Mon, Jan 6, 2020 at 4:50 AM Stephen Mallette <[email protected]>
> > wrote:
> >
> > > Hi Pieter - my thoughts are inline:
> > >
> > >
> > > > Regarding the structure api and query specification.
> > > >
> > > > Can it be specified in `formal` English rather than in Category
> Theory?
> > > > I think having the specification in Category Theory simply makes the
> > > > barrier to entry to high for many of us to partake in the
> conversation.
> > > >
> > > > I get that having a formal mathematical spec is useful and
> interesting
> > > > but perhaps it can remain just below the surface rather than being
> the
> > > > primary source.
> > > >
> > >
> > > I agree with this. I like the underpinnings and formalism that CT is
> > > bringing here, but if TinkerPop becomes harder and more abstract to use
> > as
> > > a result I don't think we're doing anything helpful. It seems important
> > > that we have some higher level language above the mathematical rigor so
> > > that the average user has a shot at using this stuff.
> > >
> > >
> > > > In TinkerPop 3 the specification was pretty much the reference
> > > > implementation itself. In TinkerPop 4 I'd prefer if the reference
> > > > implementation is in fact far less important than the specification
> > > > itself. I.e. the specification must be in English and not refer to
> api
> > > > calls in the reference implementation.
> > > >
> > >
> > > The Structure Test Suite is the worst offender there, though there are
> > > aspects of the Process Test Suite that are equally bad. I'm not sure
> > what a
> > > test suite will look like offhand, but I think we'll need to think
> harder
> > > about the types of test we write to take care that they are not bound
> too
> > > closely to the "TinkerGraph" way of doing things.
> > >
> > >
> > > > Regarding the implementation.
> > > >
> > > > Something that has always concerned me about TinkerPop's
> implementation
> > > > is that it (embedded java db's being the exception) is generally too
> > > > far away from the data. Massive latency and endless copying of the
> data
> > > > occurs.
> > >
> > >
> > > I guess Remote Graph Providers (DSG, Neptune, etc) have mitigated that
> by
> > > putting their implementations close to the data, thus executing the
> > > traversal on the server near the data and then just returning the
> > result. I
> > > think that we need to keep that model in mind for TP4 as it was really
> > only
> > > emergent in TP3 and our designs supporting that model basically were
> > > shoehorned in.
> > >
> > >
> > > > Further it has no real understanding of memory. Any step might for
> > > > whatever reason have a ReducingBarrierStep and load the full
> traversal
> > > > data set into the JVM's memory.
> > > >
> > >
> > > I'm not sure that I follow what you're looking for TP to do here. If
> you
> > > want to outline that further, perhaps start a different thread as it
> > > doesn't sound quite related to this thread on the Schema API.
> > >
> > >
> > > > Perhaps a reference implementation written in C/C++/Go/Rust... might
> be
> > > > more useful to database vendors.
> > > >
> > >
> > > All languages I don't know ;) Short of some major new contributions
> from
> > > someone, I'd expect us to be heading down the road of the JVM again as
> > our
> > > starting point.
> > >
> > >
> > > > All that said, thanks for all the work you are putting into this.
> > >
> > >
> > > Appreciate your thoughts. Take care.
> > >
> > >
> > > On Sun, Jan 5, 2020 at 2:14 PM pieter martin <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Here are some thoughts/concerns that I have.
> > > >
> > > > Regarding the structure api and query specification.
> > > >
> > > > Can it be specified in `formal` English rather than in Category
> Theory?
> > > > I think having the specification in Category Theory simply makes the
> > > > barrier to entry to high for many of us to partake in the
> conversation.
> > > >
> > > > I get that having a formal mathematical spec is useful and
> interesting
> > > > but perhaps it can remain just below the surface rather than being
> the
> > > > primary source.
> > > >
> > > > In TinkerPop 3 the specification was pretty much the reference
> > > > implementation itself. In TinkerPop 4 I'd prefer if the reference
> > > > implementation is in fact far less important than the specification
> > > > itself. I.e. the specification must be in English and not refer to
> api
> > > > calls in the reference implementation.
> > > >
> > > > Regarding the implementation.
> > > >
> > > > Something that has always concerned me about TinkerPop's
> implementation
> > > > is that it (embedded java db's being the exception) is generally too
> > > > far away from the data. Massive latency and endless copying of the
> data
> > > > occurs.
> > > > Further it has no real understanding of memory. Any step might for
> > > > whatever reason have a ReducingBarrierStep and load the full
> traversal
> > > > data set into the JVM's memory.
> > > > Perhaps a reference implementation written in C/C++/Go/Rust... might
> be
> > > > more useful to database vendors.
> > > >
> > > > All that said, thanks for all the work you are putting into this.
> > > >
> > > > Cheers
> > > > Pieter
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, 2020-01-04 at 10:51 -0800, Joshua Shinavier wrote:
> > > > > Thanks for the detailed response, Stephen. Good points made. Let's
> > > > > dig a
> > > > > little deeper to get to a common understanding of a "structure API"
> > > > > for
> > > > > TP4. I agree that Graph is a relic of the Blueprints days, and
> would
> > > > > not be
> > > > > missed. Graph.Features would then need to be renamed at the very
> > > > > least.
> > > > > However, Vertex, Edge, Property etc. are also part of the structure
> > > > > API,
> > > > > and they are fundamental. We need them in TP4, but there is also an
> > > > > opportunity to generalize them slightly to give us a strong notion
> of
> > > > > schema. Graph.Features, whatever we call it, would not be so much a
> > > > > stand-alone collection of flags describing the graph back-end, as
> it
> > > > > is
> > > > > now, but a set of constraints on the schemas you can define. It
> would
> > > > > "have
> > > > > teeth" because you could actually validate your schema against it,
> > > > > assuming
> > > > > you have chosen to define one. If we do want a handy Graph
> interface
> > > > > in
> > > > > TP4, we could consider deriving the implementation rather than
> > > > > allowing
> > > > > developers to define it themselves.
> > > > >
> > > > > W.r.t. Haskell vs. Scala -- if you / enough of us are interested in
> > > > > Haskell, we could start with a Haskell-based reference
> implementation
> > > > > before we proceed to Scala. The schema API I have in mind is
> > > > > essentially
> > > > > already written, and will be publicly available soon. It might not
> be
> > > > > a bad
> > > > > idea to explore true monadic traversals, as I have talked about
> > > > > before, in
> > > > > functionally pure Haskell first. The Gremlin-Scala [1] and Greskell
> > > > > [2]
> > > > > projects have already dug into some of the finer details and could
> be
> > > > > used
> > > > > for reference. To that, I would add monadic encapsulation of
> > > > > transactions,
> > > > > graph side-effects, and exceptions. The universality of a monadic
> > > > > approach
> > > > > to graph traversal might help us to address some of the language
> > > > > variation
> > > > > you mention, because it will be easier to describe exactly what
> basic
> > > > > steps
> > > > > do and how their effects are composed together. Although most of
> the
> > > > > languages of interest for TinkerPop back-ends are not purely
> > > > > functional,
> > > > > you can usually create APIs that are. Formal specifications of
> > > > > TinkerPop
> > > > > structure and process ought to be possible.
> > > > >
> > > > > For project structure, I say we follow your instincts, as you are
> the
> > > > > most
> > > > > intimately familiar with the code base(s) and the issues. I think
> it
> > > > > makes
> > > > > sense to continue to have a master repo for reference
> > > > > implementations, but
> > > > > yes we might want separate build systems. That will certainly be
> the
> > > > > case
> > > > > if we want to include a Haskell implementation alongside a JVM one.
> > > > > We
> > > > > might be able to make use of code generation for a one-time
> > > > > translation of
> > > > > core structure API into various target languages.
> > > > >
> > > > > To my mind, your emphasis on consistency across GLVs in TP4 goes
> well
> > > > > with
> > > > > an emphasis on a stronger type system and better-defined
> operational
> > > > > semantics for traversals.
> > > > >
> > > > > Josh
> > > > >
> > > > >
> > > > > [1]
> > > > > https://github.com/mpollmeier/gremlin-scala
> > > > >
> > > > > [2]
> > > > > https://github.com/debug-ito/greskell
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jan 3, 2020 at 5:21 AM Stephen Mallette <
> > > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Sorry it took me a bit to get to this...
> > > > > >
> > > > > > > Graph.Features will carry over into TP4
> > > > > >
> > > > > > Having Graph.Features implies having Graph which is part of the
> > > > > > Structure
> > > > > > API. Marko and I have questioned the necessity for the Graph and
> > > > > > Structure
> > > > > > API in recent years. Major graph providers who use TinkerPop
> don't
> > > > > > even
> > > > > > implement it I don't think - they just process Gremlin. This
> > > > > > "secondary"
> > > > > > API (formerly a first class citizen) also creates confusion for
> > > > > > users who
> > > > > > try to use it directly and have mixed results depending on the
> > > > > > graph they
> > > > > > choose. Worse still, they end up writing Structure API code in
> > > > > > scripts
> > > > > > embedded as strings in their code (despite advice to not do so)
> and
> > > > > > end up
> > > > > > creating  non-portable code. Furthermore, GLV users end up
> > > > > > wondering why
> > > > > > they can't do graph.addVertex() and other similar Structure API
> > > > > > calls.
> > > > > > Mixed advice in third-party blog posts compounds these issues.
> > > > > >
> > > > > > So, when you talk about the Structure API, I wonder if you mean
> to
> > > > > > keep all
> > > > > > of it or just the notion of Graph.Features (in some new revised
> > > > > > form). The
> > > > > > latter is agreeable in my mind because we likely still need some
> > > > > > way to
> > > > > > know how a graph behaves for purposes of our technology test
> suite.
> > > > > > Without
> > > > > > the Structure API, I wasn't sure yet what that would look like.
> > > > > >
> > > > > > > I feel we should use Scala for the API. This opinion is
> informed
> > > > > > > by my
> > > > > >
> > > > > > experiences writing tools of this kind in both Java and Haskell
> at
> > > > > > Uber.
> > > > > > While I am a huge fan of Haskell, practical considerations rule
> it
> > > > > > out as
> > > > > > an option. We need the API to be JVM-compatible
> > > > > >
> > > > > > Having followed along with your talks, writings, etc and with my
> > > > > > own
> > > > > > reading of Category Theory and such, I realized that a use of
> Java
> > > > > > would
> > > > > > probably not work. While I have interest in Haskell (more so than
> > > > > > Scala),
> > > > > > Scala does seem like the best fit for this work on the JVM. That
> > > > > > said,
> > > > > > there are two points I'd like us to consider that have been on my
> > > > > > mind for
> > > > > > TP4:
> > > > > >
> > > > > > 1. The realization that TinkerPop, specifically Gremlin, would be
> > > > > > available
> > > > > > natively in other language ecosystems besides the JVM came way
> too
> > > > > > late in
> > > > > > TP3. As a result, we have an extraordinarily mixed set of
> messages
> > > > > > with
> > > > > > Gremlin usage. Things work one way in Java, but another way in
> > > > > > Python. And
> > > > > > while 3.4.x unified connection options across languages, there's
> > > > > > still too
> > > > > > many ways to connect to a graph and too much discrepancy in
> > > > > > behavior. We
> > > > > > need to think about how every single feature that we create for
> TP4
> > > > > > behaves
> > > > > > in each language and what parity of capability we can achieve
> > > > > > there. And if
> > > > > > some reasonable level of parity can't be achieved for whatever
> > > > > > reason, we
> > > > > > should seriously consider either not implementing the feature or
> > > > > > the story
> > > > > > for the language ecosystems that don't have the functionality
> > > > > > better be
> > > > > > crystal clear and consistent with TinkerPop as whole. We should
> > > > > > very much
> > > > > > consider how Graph.Features (in whatever form it takes) is
> > > > > > accessible via
> > > > > > Java, Python, Javascript, etc. before going too far in any
> > > > > > particular
> > > > > > development direction.
> > > > > > 2. What is the general structure for this project with respect to
> > > > > > the
> > > > > > different language environments that we have? Personally, I still
> > > > > > like the
> > > > > > idea of a single repo, but without a single build system ruling
> it
> > > > > > all. In
> > > > > > this way each language ecosystem can take advantage of the best
> > > > > > parts of
> > > > > > its particular build tool chain without having to shoehorn into a
> > > > > > different
> > > > > > system's approach. That said, I think each ecosystem should stick
> > > > > > to a
> > > > > > single build tool chain e.g.. maven for the JVM.
> > > > > >
> > > > > > As a big picture point, I think the JVM ecosystem will be the
> model
> > > > > > for all
> > > > > > other language ecosystems. I would think that we would want to
> take
> > > > > > care
> > > > > > that we not turn TinkerPop into a Scala-only system - I assume
> this
> > > > > > work
> > > > > > isn't laying the foundation for that, but figured I'd voice the
> > > > > > concern. I
> > > > > > think we'd largely still rely on Java for development outside of
> > > > > > this
> > > > > > feature that has some specific demands not addressed well by it.
> > > > > > I'd
> > > > > > further assume that we would have some nice clean interop back to
> > > > > > Java for
> > > > > > this stuff so as to keep our core users well engaged.
> > > > > >
> > > > > > > to keep TinkerPop aligned with upcoming standards like RDF* and
> > > > > > > GQL.
> > > > > > > Interoperability with mm-ADT should be straightforward
> > > > > >
> > > > > > Thank you for keeping up with the developing standards. That's a
> > > > > > nice
> > > > > > service to TinkerPop.
> > > > > >
> > > > > > Ultimately my vision for TP4 seems to have less to do with
> specific
> > > > > > major
> > > > > > new features (thus glad to see that you're thinking in that
> manner)
> > > > > > and
> > > > > > more to do with creating consistent, coherent and easy graph
> usage
> > > > > > patterns
> > > > > > across language ecosystems for users while making it even simpler
> > > > > > for
> > > > > > providers to build their TinkerPop-enabled systems. Having seen
> so
> > > > > > much
> > > > > > success with GLVs for TP3, despite their drawbacks, I can't help
> > > > > > but sense
> > > > > > that focusing on this notion as a foundational element of design
> > > > > > for TP4
> > > > > > will further expand TinkerPop's appeal and reach.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 26, 2019 at 11:00 AM Joshua Shinavier <
> > > > > > [email protected]
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I would like to reboot the conversation around TinkerPop 4,
> > > > > > > specifically
> > > > > >
> > > > > > as
> > > > > > > it concerns the structure API. You will have seen my posts,
> ever
> > > > > > > since my
> > > > > > > presentation [1] last January, about an algebraic approach to
> > > > > > > property
> > > > > > > graph schemas and transformations, which Ryan and I formalized
> in
> > > > > > > the APG
> > > > > > > paper [2]. I am now very close to releasing the Haskell
> > > > > > > implementation of
> > > > > > > this framework as open source software (to be accompanied by an
> > > > > > > Uber
> > > > > > > Engineering Blog post, in the next few weeks if all goes well).
> > > > > > >
> > > > > > > At various times and places, I have suggested that we develop a
> > > > > >
> > > > > > Scala-based
> > > > > > > structure API for TP4 which implements APG in an extensible
> way.
> > > > > > > I think
> > > > > >
> > > > > > it
> > > > > > > is time to proceed and start committing code, or discuss
> > > > > > > alternative
> > > > > >
> > > > > > plans
> > > > > > > for the structure API. There seems to be plenty of community
> > > > > > > interest,
> > > > > >
> > > > > > and
> > > > > > > I now have an official OK to put some engineering hours towards
> > > > > > > it at
> > > > > >
> > > > > > work.
> > > > > > > I would like to align with you -- the TP PMC and other
> TinkerPop
> > > > > >
> > > > > > committers
> > > > > > > and developers -- on how to proceed, who will contribute, and
> > > > > > > what the
> > > > > > > development timeline will look like.
> > > > > > >
> > > > > > > Some specifics from my side:
> > > > > > >
> > > > > > >    - Graph.Features will carry over into TP4; it will just be a
> > > > > > > bit more
> > > > > > >    sophisticated than the current TP3 Graph.Features. Btw. I
> also
> > > > > >
> > > > > > proposed
> > > > > > >    this idea of a graph feature vector at the recent Dagstuhl
> > > > > > > Seminar
> > > > > >
> > > > > > [3],
> > > > > > >    where it caught on and will be the basis of a "dragon data
> > > > > > > model" that
> > > > > > >    might help to keep TinkerPop aligned with upcoming standards
> > > > > > > like RDF*
> > > > > > > and
> > > > > > >    GQL.
> > > > > > >    - I feel we should use Scala for the API. This opinion is
> > > > > > > informed by
> > > > > >
> > > > > > my
> > > > > > >    experiences writing tools of this kind in both Java and
> > > > > > > Haskell at
> > > > > >
> > > > > > Uber.
> > > > > > >    While I am a huge fan of Haskell, practical considerations
> > > > > > > rule it out
> > > > > > > as
> > > > > > >    an option. We need the API to be JVM-compatible. The best
> > > > > > > Haskell-JVM
> > > > > > >    bridge in is Eta [4], but IMO it is not ready to be put in
> the
> > > > > >
> > > > > > critical
> > > > > > >    path on a project such as TinkerPop; we used it at Uber for
> a
> > > > > > > while
> > > > > >
> > > > > > and
> > > > > > >    found it to be a time sink, despite the generated bytecode
> > > > > > > working
> > > > > > > great.
> > > > > > >    Likewise, I would strongly advise against continuing with a
> > > > > > > pure
> > > > > > > Java-based
> > > > > > >    API if we want to do intelligent things with graph schemas.
> > > > > > > The
> > > > > > > language is
> > > > > > >    just not appropriate as a basis for the type system in
> > > > > > > question.
> > > > > >
> > > > > > Scala,
> > > > > > > on
> > > > > > >    the other hand, has all of the advantages of Haskell in
> terms
> > > > > > > of type
> > > > > > >    safety and functional pattern matching, although it requires
> > > > > > > some
> > > > > >
> > > > > > extra
> > > > > > >    discipline to keep your code pure.
> > > > > > >    - Interoperability with Ryan's CQL (categorical query
> language
> > > > > > > [5]) is
> > > > > > >    of interest.
> > > > > > >    - Interoperability with mm-ADT should be straightforward now
> > > > > > > that
> > > > > >
> > > > > > mm-ADT
> > > > > > >    has support for union types. Hopefully, mm-ADT's type system
> > > > > > > will end
> > > > > > > up as
> > > > > > >    a proper superset of TP4's.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > Josh
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012
> > > > > >
> > > > > > > [2]
> > > > > > > https://arxiv.org/abs/1909.04881
> > > > > > >
> > > > > > > [3]
> > > > > > > https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19491
> > > > > > >
> > > > > > > [4]
> > > > > > > https://eta-lang.org
> > > > > > >
> > > > > > > [5]
> > > > > > > https://www.categoricaldata.net
> > > > > > >
> > > > > > >
> > > >
> > > >
> > >
> >
>

Reply via email to