[DISCUSS] Process API for TP4 [Was: structure API for TP4]

Stephen Mallette Mon, 13 Jan 2020 05:49:27 -0800

On the heels of the "structure API" discussion I thought i'd just start a
thread for the process api for TP4. The discussion of serialization formats
like Thrift made me think about Gremlin Server, GLVs and their overall
relationship to the process API (i.e. Gremlin then language).


I don't think TP4 should have a specific application called "Gremlin
Server". Many graph providers don't use the component directly and it
creates a component that only exists for the JVM but not in other language
ecosystems. As a result I think I've noticed that users immediately start
with a point of confusion as to what they need to get started with
TInkerPop (despite all the documentation and explanation we provide).

Let's forget about all the different graph systems out there and just think
about the basic TP3 one-liner for creating "g":

g = traversal()

If every language variant could support that syntax equally with no other
configurations we'd have something really easy to get started with. Perhaps
traversal() would just instantiate an empty embedded in-memory graph
(yes...that would mean having some form of TinkerGraph in each language)
but that graph would communicate over the same protocol as though it were
remote. From that simple start point we can start extending into remote
configurations to explicitly connect to specific graphs in specific ways,
in much the same manner as we do today. I think this approach implies that
graphs which are purely embedded today will need to expose Gremlin
Server-style functionality to be considered TinkerPop-enabled or perhaps we
can just wrap up their implementation inside of TinkerPop somehow to expose
that for them. Whether they do a native implementation which might afford
them some benefits based on their platform or rely on our implementation
puts the user in a position where they no longer need to reason about that
component which is essentially the goal I'd like to achieve.

Note that we will no longer look to support arbitrary groovy script
execution as part of TP4. If graph providers rely on that functionality for
some reason they will need to account for that. Providers often support
scripts to allow for their schema APIs to work. Given that TP4 will have
schema support I would think that they would piggy-back on whatever
infrastructure we supplied in support of that, but if there are other
features needed (DS Graph has some "system" functions for example). those
will have to be dealt with in some way. I think certain providers like
visualization tools and notebooks that support Gremlin may also hit some
problems with this change. I think that the answer is pretty simple
though...providers will just need to manage their own ScriptEngine
implementations along with all the security/memory issues that comes with
that. I considered the notion that we might maintain gremlin-groovy and its
ScriptEngine but not expose it as a "server" oriented feature because
Gremlin Console kept us bound to groovysh and a lot of that code overlaps,
however with Java now having it's own shell, I wonder if we need to touch
Groovy at all. If we were going to support a JVM language variant I'd
probably pick a few other languages first like Clojure or Scala where the
Java interop isn't as clean as Groovy's.

I suppose that's just scratching the surface of things to consider for the
process API for TP4 but these were the things that came to mind while
thinking about the other thread.



On Mon, Jan 13, 2020 at 7:53 AM Stephen Mallette <[email protected]>
wrote:

> Thanks for trying out Idris. I had a feeling it would work the way that
> you found it to but without actually trying it out there would be no way to
> know for sure.
>
> Interesting idea to use thrift to generate process classes like steps.
> Having some foundational code could be helpful in starting up and
> maintaining a GLV. With Idris I'd hoped to get more than just some
> interfaces and some core code that could supply some working logic to every
> language ecosystem we supported but perhaps that was asking too much.
>
> I've looked at Thrift before as a possible serialization format for use to
> use with Gremlin Server but given the adherence to schema that it required
> I opted away from it. Given that we now look to have the notion of a schema
> in TP4 I suppose Thrift, protocolbuffers and other such formats and
> protocols are back on the table for consideration. There is a whole
> separate discussion to be had about "Gremlin Server" and the methods by
> which users "connect" to a graph for TP4, but perhaps I will save that for
> a separate thread so as not to redirect this one too much.
>
> On Fri, Jan 10, 2020 at 1:43 PM Joshua Shinavier <[email protected]>
> wrote:
>
>> As an illustration of what we can get out of Thrift (or other code gen
>> frameworks, but Thrift is one I use frequently), here is the
>> FieldDefinition type in Java:'
>>
>> public class FieldDefinition java.io.Serializable, Cloneable,
>> Comparable<FieldDefinition> {
>>
>>   public String name;
>>   public CommonMetadata meta;
>>   public DataType type;
>>   public String referenceBy;
>>   public int index;
>>   public String defaultValue;
>>   public boolean primaryKey;
>>
>> ...
>>
>>
>> and Python
>>
>> class FieldDefinition(object):
>>
>>     def __init__(self, name=None, meta=None, type=None,
>> referenceBy=None, index=None, defaultValue=None, primaryKey=None,):
>>         self.name = name
>>         self.meta = meta
>>         self.type = type
>>         self.referenceBy = referenceBy
>>         self.index = index
>>         self.defaultValue = defaultValue
>>         self.primaryKey = primaryKey
>>
>> ...
>>
>>
>> and Go:
>>
>> type FieldDefinition struct {
>>   Name *FieldName `thrift:"name,1" db:"name" json:"name,omitempty"`
>>   Meta *CommonMetadata `thrift:"meta,2" db:"meta" json:"meta,omitempty"`
>>   Type *DataType `thrift:"type,3" db:"type" json:"type,omitempty"`
>>   ReferenceBy *FieldName `thrift:"referenceBy,4" db:"referenceBy"
>> json:"referenceBy,omitempty"`
>>   Index *FieldIndex `thrift:"index,5" db:"index" json:"index,omitempty"`
>>   DefaultValue *string `thrift:"defaultValue,6" db:"defaultValue"
>> json:"defaultValue,omitempty"`
>>   PrimaryKey *bool `thrift:"primaryKey,7" db:"primaryKey"
>> json:"primaryKey,omitempty"`
>> }
>>
>>
>> etc. We can generate similar skeleton APIs for anything we can define with
>> a schema. This will ensure that new GLVs start out with the same basic
>> structural assumptions about elements, types, etc. The same approach can
>> also be used for classes on the process side, such as for steps --
>> constraining inputs and outputs. The YAML for the examples above looks
>> like
>> this:
>>
>> - name: FieldDefinition
>>   type:
>>     record:
>>       - name: name
>>         type: FieldName
>>       - name: meta
>>         type:
>>           optional: CommonMetadata
>>       - name: type
>>         type: DataType
>>       - name: referenceBy
>>         type:
>>           optional: FieldName
>>       - name: index
>>         type:
>>           optional: FieldIndex
>>       - name: defaultValue
>>         type:
>>           optional: string
>>       - name: primaryKey
>>         type: boolean
>>
>>
>> Josh
>>
>>
>>
>> On Thu, Jan 9, 2020 at 3:08 PM Joshua Shinavier <[email protected]>
>> wrote:
>>
>> > So, w.r.t. "empty promises", I wanted to give the Idris code generation
>> > idea a shot before we continue talking about it as a possibility for
>> TP4. I
>> > enhanced my Haskell transformer to generate either Haskell or Idris data
>> > type definitions. This works well enough for product types and simple
>> union
>> > types (enums). Haskell-style records with multiple constructors
>> (including
>> > the example I provided to Pieter above) are not supported in Idris, and
>> I
>> > haven't gotten deep enough into the Idris type system to be sure of the
>> > best construction for a sum of product types.
>> >
>> > However.
>> >
>> > The code I'm able to generate using idris --codegen (I have tried
>> > JavaScript and C) is not encouraging. I probably could have found this
>> out
>> > earlier without going to the trouble of writing a transformer first.
>> > Whereas I was talking about using Thrift for generating *interface*
>> > definitions, Idris code generation does not seem to result in a friendly
>> > interface. It is optimized executable code that is generated, and this
>> does
>> > not necessarily resemble the data type / record definitions that were
>> used
>> > to generate it. For example, I can generate (starting from YAML) an
>> Idris
>> > record definition like this:
>> >
>> > record FieldDefinition where
>> >   constructor MkFieldDefinition
>> >   fieldDefinitionName : FieldName
>> >   fieldDefinitionMeta : Maybe CommonMetadata
>> >   fieldDefinitionType : DataType
>> >   fieldDefinitionReferenceBy : Maybe FieldName
>> >   fieldDefinitionIndex : Maybe FieldIndex
>> >   fieldDefinitionDefaultValue : Maybe String
>> >   fieldDefinitionPrimaryKey : Bool
>> >
>> > ...and then I can use idris --codegen javascript to generate a *.js
>> file.
>> > However, unless I use FieldDefinition in the main of my Idris program,
>> it
>> > is omitted entirely from the *.js file. If I do use the type definition
>> > from main, I get some code which exactly implements whatever operation I
>> > performed on it in my main, but I do not get any kind of stand-alone
>> > FieldDefinition definition. This makes the generated code not-so-useful
>> > for the sake of creating a skeleton API. The situation is even worse for
>> > the C target, as executable code rather than C source code is
>> generated. In
>> > the case of the (external) Java codegen implementation, it appears that
>> > bytecode, not Java source code, is generated.
>> >
>> > tl;dr Idtris was worth looking at, but unless I am missing something, it
>> > won't help us.
>> >
>> > thrift --gen, on the other hand, does generate source code, and supports
>> > all of these languages <https://thrift.apache.org/lib/>.
>> >
>> > W.r.t. Neo4j, you are right that it doesn't have a proper schema
>> language;
>> > the closest thing right now are Neo4j indexes and constraints
>> > <https://neo4j.com/docs/cypher-manual/current/schema/>. However,
>> schemas
>> > are definitely being thought about and worked on, and AFAIK should
>> solidify
>> > by the time of a GQL release at the latest. The emerging notions of GQL
>> > schema have some differences from the algebraic schemas I am pushing,
>> > particularly as concerns descriptive (rather than prescriptive) and
>> partial
>> > schemas. There will be GQL features like multiple typing which are not
>> > currently supported by APG, and there are APG features like sum types
>> which
>> > may not be supported by GQL. This was discussed at Dagstuhl, and
>> achieving
>> > better alignment on how to deal with differences in schema languages was
>> > one of the TODOs from the seminar. See Juan Sequeda's blog post
>> > <
>> http://www.juansequeda.com/blog/2019/12/11/trip-report-on-big-graph-processing-systems-dagstuhl-seminar/
>> >
>> > for details. Long story short, the "star topology" approach I presented
>> in
>> > connection with APG can be extended to include schema languages that do
>> not
>> > fit neatly into APG, and this is supposed to become an active research
>> > topic which I will continue to work on along with a number of others. I
>> > believe we should be able to make our API generic enough to be used with
>> > Neo4j, JanusGraph, and other backends with their own schema APIs, but
>> some
>> > of the details are TBD. I suggest using APG as a starting point.
>> >
>> > Josh
>> >
>> >
>> >
>> >
>> > On Wed, Jan 8, 2020 at 5:16 AM Stephen Mallette <[email protected]>
>> > wrote:
>> >
>> >> it would be really nice to see a model of the schema API work using
>> Idris
>> >> so that we might see how it could apply to the Gremlin/Process side of
>> >> things. I don't quite know how it will all work but I picture our
>> defining
>> >> some core parts of TinkerPop with Idris which would hopefully allow
>> >> generation of a nice body of code in the languages we want to support
>> and
>> >> then we'd just write code dependent on that core for language ecosystem
>> >> specific functionality and usability improvements. Such an approach
>> would
>> >> go a long way to allowing TP4 to go far beyond just GLVs. We could
>> >> realistically maintain full VM implementations possibly without a ton
>> of
>> >> work. While optimistic, I guess I'm still skeptical. I've never really
>> had
>> >> much luck with code generation doing anything really deep - more often
>> >> than
>> >> not it's empty promises.
>> >>
>> >> How do you see the notion of a schema aligning with Gremlin and the
>> >> current
>> >> methods by which Gremlin is supported by graph providers? Like, how
>> will
>> >> graphs that don't natively have schema, like Neo4j, work with
>> TinkerPop's
>> >> schema API? For graphs that have a native schema language like
>> JanusGraph
>> >> and DS Graph, what will they need to do to support TinkerPop's schema
>> >> APIs?
>> >> If we're true to the design goal of native language support, I'd be
>> >> curious
>> >> as to how we will achieve schema API interop in the same way that we
>> have
>> >> Gremlin capable of being written with python but then processed on the
>> JVM
>> >> when executed. How will we make it so that we can write our schema in
>> >> python and have it apply to a JVM based graph (that might be remotely
>> >> hosted like CosmosDB with no native schema or DS Graph with a native
>> >> schema)?
>> >>
>> >> On Tue, Jan 7, 2020 at 1:00 PM Joshua Shinavier <[email protected]>
>> >> wrote:
>> >>
>> >> > That might be an even better option. I don't have any experience with
>> >> > Idris, but the syntax for data type definitions is pretty similar to
>> >> > Haskell's. I have a mapping already written (in Haskell) that takes
>> >> schemas
>> >> > defined in YAML to Haskell data type definitions; I imagine I could
>> >> tweak
>> >> > it slightly to generate Idris definitions instead, and from there we
>> >> could
>> >> > take advantage of Idris code generation. Come to think of it, there
>> are
>> >> > also quite a few codegen projects in Haskell that could be used. With
>> >> > Idris, however, it seems that code generation was a design
>> consideration
>> >> > for the language itself.
>> >> >
>> >> > Josh
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jan 7, 2020 at 4:05 AM Stephen Mallette <
>> [email protected]>
>> >> > wrote:
>> >> >
>> >> > > Regarding code generation...
>> >> > >
>> >> > > A while ago, James Thornton put me onto Idris which is sorta what
>> >> sent me
>> >> > > trying to learn Haskell:
>> >> > >
>> >> > > http://docs.idris-lang.org/en/latest/reference/codegen.html
>> >> > >
>> >> > > I don't really have a sense of whether or not we could use that to
>> our
>> >> > > advantage. Perhaps you do Josh?
>> >> > >
>> >> > > On Mon, Jan 6, 2020 at 1:08 PM Joshua Shinavier <[email protected]
>> >
>> >> > wrote:
>> >> > >
>> >> > > > Hi Pieter, Stephen,
>> >> > > >
>> >> > > > Pieter: Can it be specified in `formal` English rather than in
>> >> Category
>> >> > > > Theory?
>> >> > > > Josh: Sure. CT is a mathematical framework that makes our
>> >> definition of
>> >> > > the
>> >> > > > data model rigorous, but the data model can also be described in
>> >> plain
>> >> > > > English. We tried to do both in the paper, and naturally the
>> >> reference
>> >> > > > documentation for TinkerPop will be extended for any new APIs.
>> You
>> >> will
>> >> > > be
>> >> > > > able to get pretty far in understanding the data model just by
>> >> looking
>> >> > at
>> >> > > > the code. For example, even if you don't know Haskell, you might
>> be
>> >> > able
>> >> > > to
>> >> > > > tell what is going on here:
>> >> > > >
>> >> > > > data DataType
>> >> > > >   = PrimitiveType PrimitiveType
>> >> > > >   | NamedType TypeReference
>> >> > > >   | ProductType
>> >> > > >       { productFields  :: [Field] }
>> >> > > >   | SumType
>> >> > > >       { sumCases       :: [Field] }
>> >> > > >   | EnumType
>> >> > > >       { enumValues     :: [Field] }
>> >> > > >   | OptionalType
>> >> > > >       { optionalType   :: DataType }
>> >> > > >   | ListType
>> >> > > >       { elementType    :: DataType }
>> >> > > >   | SetType
>> >> > > >       { setElementType :: DataType }
>> >> > > >   | MapType
>> >> > > >       { keyType        :: DataType
>> >> > > >       , valueType      :: DataType }
>> >> > > >
>> >> > > >
>> >> > > > A data type is either a primitive type:
>> >> > > >
>> >> > > > data PrimitiveType
>> >> > > >   = BinaryType
>> >> > > >   | BooleanType
>> >> > > >   | FloatType
>> >> > > >     { floatTypePrecision   :: BitPrecision }
>> >> > > >   | IntegerType
>> >> > > >     { integerTypePrecision :: BitPrecision
>> >> > > >     , integerTypeSigned    :: Bool }
>> >> > > >   | StringType
>> >> > > >
>> >> > > >
>> >> > > > ...or it's a named ("labeled") type like "Person" or "knows", or
>> a
>> >> sum
>> >> > or
>> >> > > > product type, or one of a few other things depending on what we
>> >> choose
>> >> > to
>> >> > > > support in TinkerPop. To this, we will probably add VertexType,
>> >> > EdgeType,
>> >> > > > and PropertyType. Yes, logically they are product types, but they
>> >> are
>> >> > > > fairly special in TinkerPop, and deserve their own constructors,
>> >> like
>> >> > the
>> >> > > > OptionalType and EnumType constructors you see above (optionals
>> and
>> >> > enums
>> >> > > > being special sum types). When we get down into the actual code
>> and
>> >> > > > documentation, I don't think users are going to need to worry
>> about
>> >> > > > category theory.
>> >> > > >
>> >> > > >
>> >> > > > Pieter: "I'd prefer if the reference implementation is in fact
>> far
>> >> less
>> >> > > > important than the specification itself"
>> >> > > > Josh: I think the reason we have never had a real specification
>> is
>> >> that
>> >> > > > neither the property graph data model nor the operational
>> semantics
>> >> of
>> >> > > > Gremlin had been formalized. We're halfway there now with the
>> >> formal PG
>> >> > > > data model. The extent to which Gremlin can be formalized for
>> TP4 is
>> >> > TBD,
>> >> > > > though I would like to see things move things in the direction
>> of a
>> >> > > monadic
>> >> > > > formalism as I say. The further we go in that direction, I'd say
>> the
>> >> > > easier
>> >> > > > it will be to write a spec.
>> >> > > >
>> >> > > > W.r.t. making implementations more efficient, that's somewhat
>> >> > orthogonal
>> >> > > to
>> >> > > > what I'm trying to do, but at least in Scala (and Haskell if we
>> >> decide
>> >> > to
>> >> > > > pursue a full implementation there) I do see a lot of the nested
>> >> > iterator
>> >> > > > messiness and other intermediate abstractions going away.
>> >> > > >
>> >> > > > Stephen: "I think the idea is more about the notion that the
>> >> Structure
>> >> > > API
>> >> > > > which is a provider API is something that can go away as a
>> concept."
>> >> > > > Josh: OK, yes, I can see edge and vertex implementations going
>> >> away, as
>> >> > > > well, if the basic data access operations for outV, inV, etc.
>> etc.
>> >> are
>> >> > > > implemented by the provider on the process side instead.
>> >> > > >
>> >> > > > Stephen: "I think I'm interested in "practical" so Scala seems
>> >> right."
>> >> > > > Josh: Well, now I think I might take a stab at a basic Haskell
>> >> > > > implementation just for the sake of prototyping in my favorite
>> >> > > programming
>> >> > > > language. May or may not become part of TinkerPop proper.
>> >> > > >
>> >> > > > Stephen: "That would be great. We currently do that for GLVs and
>> >> it's
>> >> > > > pretty ugly and was mostly useful in the very initial bit of each
>> >> new
>> >> > > > language ecosystem implementation as it just saved a ton of
>> typing
>> >> for
>> >> > > the
>> >> > > > creation of GraphTraversal, GraphTraversalSource and __."
>> >> > > > Josh: Let's see exactly what we want to generate in each target
>> >> > > language. I
>> >> > > > was thinking of generating code for basic structural classes like
>> >> > > vertices
>> >> > > > and edges, which would be easy enough to do right now just be
>> >> defining
>> >> > a
>> >> > > > schema for the objects, translating that schema to Thrift IDL,
>> >> > generating
>> >> > > > code in each of the target languages, and then gutting the
>> generated
>> >> > code
>> >> > > > to remove all Thrift-specific logic. For Java and Python, that
>> >> seems to
>> >> > > > result in a pretty good starting point for an API.
>> >> > > >
>> >> > > >
>> >> > > > Josh
>> >> > > >
>> >> > > >
>> >> > > > On Mon, Jan 6, 2020 at 4:50 AM Stephen Mallette <
>> >> [email protected]>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Hi Pieter - my thoughts are inline:
>> >> > > > >
>> >> > > > >
>> >> > > > > > Regarding the structure api and query specification.
>> >> > > > > >
>> >> > > > > > Can it be specified in `formal` English rather than in
>> Category
>> >> > > Theory?
>> >> > > > > > I think having the specification in Category Theory simply
>> makes
>> >> > the
>> >> > > > > > barrier to entry to high for many of us to partake in the
>> >> > > conversation.
>> >> > > > > >
>> >> > > > > > I get that having a formal mathematical spec is useful and
>> >> > > interesting
>> >> > > > > > but perhaps it can remain just below the surface rather than
>> >> being
>> >> > > the
>> >> > > > > > primary source.
>> >> > > > > >
>> >> > > > >
>> >> > > > > I agree with this. I like the underpinnings and formalism that
>> CT
>> >> is
>> >> > > > > bringing here, but if TinkerPop becomes harder and more
>> abstract
>> >> to
>> >> > use
>> >> > > > as
>> >> > > > > a result I don't think we're doing anything helpful. It seems
>> >> > important
>> >> > > > > that we have some higher level language above the mathematical
>> >> rigor
>> >> > so
>> >> > > > > that the average user has a shot at using this stuff.
>> >> > > > >
>> >> > > > >
>> >> > > > > > In TinkerPop 3 the specification was pretty much the
>> reference
>> >> > > > > > implementation itself. In TinkerPop 4 I'd prefer if the
>> >> reference
>> >> > > > > > implementation is in fact far less important than the
>> >> specification
>> >> > > > > > itself. I.e. the specification must be in English and not
>> refer
>> >> to
>> >> > > api
>> >> > > > > > calls in the reference implementation.
>> >> > > > > >
>> >> > > > >
>> >> > > > > The Structure Test Suite is the worst offender there, though
>> there
>> >> > are
>> >> > > > > aspects of the Process Test Suite that are equally bad. I'm not
>> >> sure
>> >> > > > what a
>> >> > > > > test suite will look like offhand, but I think we'll need to
>> think
>> >> > > harder
>> >> > > > > about the types of test we write to take care that they are not
>> >> bound
>> >> > > too
>> >> > > > > closely to the "TinkerGraph" way of doing things.
>> >> > > > >
>> >> > > > >
>> >> > > > > > Regarding the implementation.
>> >> > > > > >
>> >> > > > > > Something that has always concerned me about TinkerPop's
>> >> > > implementation
>> >> > > > > > is that it (embedded java db's being the exception) is
>> generally
>> >> > too
>> >> > > > > > far away from the data. Massive latency and endless copying
>> of
>> >> the
>> >> > > data
>> >> > > > > > occurs.
>> >> > > > >
>> >> > > > >
>> >> > > > > I guess Remote Graph Providers (DSG, Neptune, etc) have
>> mitigated
>> >> > that
>> >> > > by
>> >> > > > > putting their implementations close to the data, thus executing
>> >> the
>> >> > > > > traversal on the server near the data and then just returning
>> the
>> >> > > > result. I
>> >> > > > > think that we need to keep that model in mind for TP4 as it was
>> >> > really
>> >> > > > only
>> >> > > > > emergent in TP3 and our designs supporting that model basically
>> >> were
>> >> > > > > shoehorned in.
>> >> > > > >
>> >> > > > >
>> >> > > > > > Further it has no real understanding of memory. Any step
>> might
>> >> for
>> >> > > > > > whatever reason have a ReducingBarrierStep and load the full
>> >> > > traversal
>> >> > > > > > data set into the JVM's memory.
>> >> > > > > >
>> >> > > > >
>> >> > > > > I'm not sure that I follow what you're looking for TP to do
>> here.
>> >> If
>> >> > > you
>> >> > > > > want to outline that further, perhaps start a different thread
>> as
>> >> it
>> >> > > > > doesn't sound quite related to this thread on the Schema API.
>> >> > > > >
>> >> > > > >
>> >> > > > > > Perhaps a reference implementation written in
>> C/C++/Go/Rust...
>> >> > might
>> >> > > be
>> >> > > > > > more useful to database vendors.
>> >> > > > > >
>> >> > > > >
>> >> > > > > All languages I don't know ;) Short of some major new
>> >> contributions
>> >> > > from
>> >> > > > > someone, I'd expect us to be heading down the road of the JVM
>> >> again
>> >> > as
>> >> > > > our
>> >> > > > > starting point.
>> >> > > > >
>> >> > > > >
>> >> > > > > > All that said, thanks for all the work you are putting into
>> >> this.
>> >> > > > >
>> >> > > > >
>> >> > > > > Appreciate your thoughts. Take care.
>> >> > > > >
>> >> > > > >
>> >> > > > > On Sun, Jan 5, 2020 at 2:14 PM pieter martin <
>> >> > [email protected]>
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > > Hi,
>> >> > > > > >
>> >> > > > > > Here are some thoughts/concerns that I have.
>> >> > > > > >
>> >> > > > > > Regarding the structure api and query specification.
>> >> > > > > >
>> >> > > > > > Can it be specified in `formal` English rather than in
>> Category
>> >> > > Theory?
>> >> > > > > > I think having the specification in Category Theory simply
>> makes
>> >> > the
>> >> > > > > > barrier to entry to high for many of us to partake in the
>> >> > > conversation.
>> >> > > > > >
>> >> > > > > > I get that having a formal mathematical spec is useful and
>> >> > > interesting
>> >> > > > > > but perhaps it can remain just below the surface rather than
>> >> being
>> >> > > the
>> >> > > > > > primary source.
>> >> > > > > >
>> >> > > > > > In TinkerPop 3 the specification was pretty much the
>> reference
>> >> > > > > > implementation itself. In TinkerPop 4 I'd prefer if the
>> >> reference
>> >> > > > > > implementation is in fact far less important than the
>> >> specification
>> >> > > > > > itself. I.e. the specification must be in English and not
>> refer
>> >> to
>> >> > > api
>> >> > > > > > calls in the reference implementation.
>> >> > > > > >
>> >> > > > > > Regarding the implementation.
>> >> > > > > >
>> >> > > > > > Something that has always concerned me about TinkerPop's
>> >> > > implementation
>> >> > > > > > is that it (embedded java db's being the exception) is
>> generally
>> >> > too
>> >> > > > > > far away from the data. Massive latency and endless copying
>> of
>> >> the
>> >> > > data
>> >> > > > > > occurs.
>> >> > > > > > Further it has no real understanding of memory. Any step
>> might
>> >> for
>> >> > > > > > whatever reason have a ReducingBarrierStep and load the full
>> >> > > traversal
>> >> > > > > > data set into the JVM's memory.
>> >> > > > > > Perhaps a reference implementation written in
>> C/C++/Go/Rust...
>> >> > might
>> >> > > be
>> >> > > > > > more useful to database vendors.
>> >> > > > > >
>> >> > > > > > All that said, thanks for all the work you are putting into
>> >> this.
>> >> > > > > >
>> >> > > > > > Cheers
>> >> > > > > > Pieter
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On Sat, 2020-01-04 at 10:51 -0800, Joshua Shinavier wrote:
>> >> > > > > > > Thanks for the detailed response, Stephen. Good points
>> made.
>> >> > Let's
>> >> > > > > > > dig a
>> >> > > > > > > little deeper to get to a common understanding of a
>> "structure
>> >> > API"
>> >> > > > > > > for
>> >> > > > > > > TP4. I agree that Graph is a relic of the Blueprints days,
>> and
>> >> > > would
>> >> > > > > > > not be
>> >> > > > > > > missed. Graph.Features would then need to be renamed at the
>> >> very
>> >> > > > > > > least.
>> >> > > > > > > However, Vertex, Edge, Property etc. are also part of the
>> >> > structure
>> >> > > > > > > API,
>> >> > > > > > > and they are fundamental. We need them in TP4, but there is
>> >> also
>> >> > an
>> >> > > > > > > opportunity to generalize them slightly to give us a strong
>> >> > notion
>> >> > > of
>> >> > > > > > > schema. Graph.Features, whatever we call it, would not be
>> so
>> >> > much a
>> >> > > > > > > stand-alone collection of flags describing the graph
>> >> back-end, as
>> >> > > it
>> >> > > > > > > is
>> >> > > > > > > now, but a set of constraints on the schemas you can
>> define.
>> >> It
>> >> > > would
>> >> > > > > > > "have
>> >> > > > > > > teeth" because you could actually validate your schema
>> against
>> >> > it,
>> >> > > > > > > assuming
>> >> > > > > > > you have chosen to define one. If we do want a handy Graph
>> >> > > interface
>> >> > > > > > > in
>> >> > > > > > > TP4, we could consider deriving the implementation rather
>> than
>> >> > > > > > > allowing
>> >> > > > > > > developers to define it themselves.
>> >> > > > > > >
>> >> > > > > > > W.r.t. Haskell vs. Scala -- if you / enough of us are
>> >> interested
>> >> > in
>> >> > > > > > > Haskell, we could start with a Haskell-based reference
>> >> > > implementation
>> >> > > > > > > before we proceed to Scala. The schema API I have in mind
>> is
>> >> > > > > > > essentially
>> >> > > > > > > already written, and will be publicly available soon. It
>> might
>> >> > not
>> >> > > be
>> >> > > > > > > a bad
>> >> > > > > > > idea to explore true monadic traversals, as I have talked
>> >> about
>> >> > > > > > > before, in
>> >> > > > > > > functionally pure Haskell first. The Gremlin-Scala [1] and
>> >> > Greskell
>> >> > > > > > > [2]
>> >> > > > > > > projects have already dug into some of the finer details
>> and
>> >> > could
>> >> > > be
>> >> > > > > > > used
>> >> > > > > > > for reference. To that, I would add monadic encapsulation
>> of
>> >> > > > > > > transactions,
>> >> > > > > > > graph side-effects, and exceptions. The universality of a
>> >> monadic
>> >> > > > > > > approach
>> >> > > > > > > to graph traversal might help us to address some of the
>> >> language
>> >> > > > > > > variation
>> >> > > > > > > you mention, because it will be easier to describe exactly
>> >> what
>> >> > > basic
>> >> > > > > > > steps
>> >> > > > > > > do and how their effects are composed together. Although
>> most
>> >> of
>> >> > > the
>> >> > > > > > > languages of interest for TinkerPop back-ends are not
>> purely
>> >> > > > > > > functional,
>> >> > > > > > > you can usually create APIs that are. Formal
>> specifications of
>> >> > > > > > > TinkerPop
>> >> > > > > > > structure and process ought to be possible.
>> >> > > > > > >
>> >> > > > > > > For project structure, I say we follow your instincts, as
>> you
>> >> are
>> >> > > the
>> >> > > > > > > most
>> >> > > > > > > intimately familiar with the code base(s) and the issues. I
>> >> think
>> >> > > it
>> >> > > > > > > makes
>> >> > > > > > > sense to continue to have a master repo for reference
>> >> > > > > > > implementations, but
>> >> > > > > > > yes we might want separate build systems. That will
>> certainly
>> >> be
>> >> > > the
>> >> > > > > > > case
>> >> > > > > > > if we want to include a Haskell implementation alongside a
>> JVM
>> >> > one.
>> >> > > > > > > We
>> >> > > > > > > might be able to make use of code generation for a one-time
>> >> > > > > > > translation of
>> >> > > > > > > core structure API into various target languages.
>> >> > > > > > >
>> >> > > > > > > To my mind, your emphasis on consistency across GLVs in TP4
>> >> goes
>> >> > > well
>> >> > > > > > > with
>> >> > > > > > > an emphasis on a stronger type system and better-defined
>> >> > > operational
>> >> > > > > > > semantics for traversals.
>> >> > > > > > >
>> >> > > > > > > Josh
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > > [1]
>> >> > > > > > > https://github.com/mpollmeier/gremlin-scala
>> >> > > > > > >
>> >> > > > > > > [2]
>> >> > > > > > > https://github.com/debug-ito/greskell
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > > On Fri, Jan 3, 2020 at 5:21 AM Stephen Mallette <
>> >> > > > > > > [email protected]
>> >> > > > > > > >
>> >> > > > > > > wrote:
>> >> > > > > > >
>> >> > > > > > > > Sorry it took me a bit to get to this...
>> >> > > > > > > >
>> >> > > > > > > > > Graph.Features will carry over into TP4
>> >> > > > > > > >
>> >> > > > > > > > Having Graph.Features implies having Graph which is part
>> of
>> >> the
>> >> > > > > > > > Structure
>> >> > > > > > > > API. Marko and I have questioned the necessity for the
>> Graph
>> >> > and
>> >> > > > > > > > Structure
>> >> > > > > > > > API in recent years. Major graph providers who use
>> TinkerPop
>> >> > > don't
>> >> > > > > > > > even
>> >> > > > > > > > implement it I don't think - they just process Gremlin.
>> This
>> >> > > > > > > > "secondary"
>> >> > > > > > > > API (formerly a first class citizen) also creates
>> confusion
>> >> for
>> >> > > > > > > > users who
>> >> > > > > > > > try to use it directly and have mixed results depending
>> on
>> >> the
>> >> > > > > > > > graph they
>> >> > > > > > > > choose. Worse still, they end up writing Structure API
>> code
>> >> in
>> >> > > > > > > > scripts
>> >> > > > > > > > embedded as strings in their code (despite advice to not
>> do
>> >> so)
>> >> > > and
>> >> > > > > > > > end up
>> >> > > > > > > > creating  non-portable code. Furthermore, GLV users end
>> up
>> >> > > > > > > > wondering why
>> >> > > > > > > > they can't do graph.addVertex() and other similar
>> Structure
>> >> API
>> >> > > > > > > > calls.
>> >> > > > > > > > Mixed advice in third-party blog posts compounds these
>> >> issues.
>> >> > > > > > > >
>> >> > > > > > > > So, when you talk about the Structure API, I wonder if
>> you
>> >> mean
>> >> > > to
>> >> > > > > > > > keep all
>> >> > > > > > > > of it or just the notion of Graph.Features (in some new
>> >> revised
>> >> > > > > > > > form). The
>> >> > > > > > > > latter is agreeable in my mind because we likely still
>> need
>> >> > some
>> >> > > > > > > > way to
>> >> > > > > > > > know how a graph behaves for purposes of our technology
>> test
>> >> > > suite.
>> >> > > > > > > > Without
>> >> > > > > > > > the Structure API, I wasn't sure yet what that would look
>> >> like.
>> >> > > > > > > >
>> >> > > > > > > > > I feel we should use Scala for the API. This opinion is
>> >> > > informed
>> >> > > > > > > > > by my
>> >> > > > > > > >
>> >> > > > > > > > experiences writing tools of this kind in both Java and
>> >> Haskell
>> >> > > at
>> >> > > > > > > > Uber.
>> >> > > > > > > > While I am a huge fan of Haskell, practical
>> considerations
>> >> rule
>> >> > > it
>> >> > > > > > > > out as
>> >> > > > > > > > an option. We need the API to be JVM-compatible
>> >> > > > > > > >
>> >> > > > > > > > Having followed along with your talks, writings, etc and
>> >> with
>> >> > my
>> >> > > > > > > > own
>> >> > > > > > > > reading of Category Theory and such, I realized that a
>> use
>> >> of
>> >> > > Java
>> >> > > > > > > > would
>> >> > > > > > > > probably not work. While I have interest in Haskell
>> (more so
>> >> > than
>> >> > > > > > > > Scala),
>> >> > > > > > > > Scala does seem like the best fit for this work on the
>> JVM.
>> >> > That
>> >> > > > > > > > said,
>> >> > > > > > > > there are two points I'd like us to consider that have
>> been
>> >> on
>> >> > my
>> >> > > > > > > > mind for
>> >> > > > > > > > TP4:
>> >> > > > > > > >
>> >> > > > > > > > 1. The realization that TinkerPop, specifically Gremlin,
>> >> would
>> >> > be
>> >> > > > > > > > available
>> >> > > > > > > > natively in other language ecosystems besides the JVM
>> came
>> >> way
>> >> > > too
>> >> > > > > > > > late in
>> >> > > > > > > > TP3. As a result, we have an extraordinarily mixed set of
>> >> > > messages
>> >> > > > > > > > with
>> >> > > > > > > > Gremlin usage. Things work one way in Java, but another
>> way
>> >> in
>> >> > > > > > > > Python. And
>> >> > > > > > > > while 3.4.x unified connection options across languages,
>> >> > there's
>> >> > > > > > > > still too
>> >> > > > > > > > many ways to connect to a graph and too much discrepancy
>> in
>> >> > > > > > > > behavior. We
>> >> > > > > > > > need to think about how every single feature that we
>> create
>> >> for
>> >> > > TP4
>> >> > > > > > > > behaves
>> >> > > > > > > > in each language and what parity of capability we can
>> >> achieve
>> >> > > > > > > > there. And if
>> >> > > > > > > > some reasonable level of parity can't be achieved for
>> >> whatever
>> >> > > > > > > > reason, we
>> >> > > > > > > > should seriously consider either not implementing the
>> >> feature
>> >> > or
>> >> > > > > > > > the story
>> >> > > > > > > > for the language ecosystems that don't have the
>> >> functionality
>> >> > > > > > > > better be
>> >> > > > > > > > crystal clear and consistent with TinkerPop as whole. We
>> >> should
>> >> > > > > > > > very much
>> >> > > > > > > > consider how Graph.Features (in whatever form it takes)
>> is
>> >> > > > > > > > accessible via
>> >> > > > > > > > Java, Python, Javascript, etc. before going too far in
>> any
>> >> > > > > > > > particular
>> >> > > > > > > > development direction.
>> >> > > > > > > > 2. What is the general structure for this project with
>> >> respect
>> >> > to
>> >> > > > > > > > the
>> >> > > > > > > > different language environments that we have?
>> Personally, I
>> >> > still
>> >> > > > > > > > like the
>> >> > > > > > > > idea of a single repo, but without a single build system
>> >> ruling
>> >> > > it
>> >> > > > > > > > all. In
>> >> > > > > > > > this way each language ecosystem can take advantage of
>> the
>> >> best
>> >> > > > > > > > parts of
>> >> > > > > > > > its particular build tool chain without having to
>> shoehorn
>> >> > into a
>> >> > > > > > > > different
>> >> > > > > > > > system's approach. That said, I think each ecosystem
>> should
>> >> > stick
>> >> > > > > > > > to a
>> >> > > > > > > > single build tool chain e.g.. maven for the JVM.
>> >> > > > > > > >
>> >> > > > > > > > As a big picture point, I think the JVM ecosystem will be
>> >> the
>> >> > > model
>> >> > > > > > > > for all
>> >> > > > > > > > other language ecosystems. I would think that we would
>> want
>> >> to
>> >> > > take
>> >> > > > > > > > care
>> >> > > > > > > > that we not turn TinkerPop into a Scala-only system - I
>> >> assume
>> >> > > this
>> >> > > > > > > > work
>> >> > > > > > > > isn't laying the foundation for that, but figured I'd
>> voice
>> >> the
>> >> > > > > > > > concern. I
>> >> > > > > > > > think we'd largely still rely on Java for development
>> >> outside
>> >> > of
>> >> > > > > > > > this
>> >> > > > > > > > feature that has some specific demands not addressed
>> well by
>> >> > it.
>> >> > > > > > > > I'd
>> >> > > > > > > > further assume that we would have some nice clean interop
>> >> back
>> >> > to
>> >> > > > > > > > Java for
>> >> > > > > > > > this stuff so as to keep our core users well engaged.
>> >> > > > > > > >
>> >> > > > > > > > > to keep TinkerPop aligned with upcoming standards like
>> >> RDF*
>> >> > and
>> >> > > > > > > > > GQL.
>> >> > > > > > > > > Interoperability with mm-ADT should be straightforward
>> >> > > > > > > >
>> >> > > > > > > > Thank you for keeping up with the developing standards.
>> >> That's
>> >> > a
>> >> > > > > > > > nice
>> >> > > > > > > > service to TinkerPop.
>> >> > > > > > > >
>> >> > > > > > > > Ultimately my vision for TP4 seems to have less to do
>> with
>> >> > > specific
>> >> > > > > > > > major
>> >> > > > > > > > new features (thus glad to see that you're thinking in
>> that
>> >> > > manner)
>> >> > > > > > > > and
>> >> > > > > > > > more to do with creating consistent, coherent and easy
>> graph
>> >> > > usage
>> >> > > > > > > > patterns
>> >> > > > > > > > across language ecosystems for users while making it even
>> >> > simpler
>> >> > > > > > > > for
>> >> > > > > > > > providers to build their TinkerPop-enabled systems.
>> Having
>> >> seen
>> >> > > so
>> >> > > > > > > > much
>> >> > > > > > > > success with GLVs for TP3, despite their drawbacks, I
>> can't
>> >> > help
>> >> > > > > > > > but sense
>> >> > > > > > > > that focusing on this notion as a foundational element of
>> >> > design
>> >> > > > > > > > for TP4
>> >> > > > > > > > will further expand TinkerPop's appeal and reach.
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > On Thu, Dec 26, 2019 at 11:00 AM Joshua Shinavier <
>> >> > > > > > > > [email protected]
>> >> > > > > > > > >
>> >> > > > > > > > wrote:
>> >> > > > > > > >
>> >> > > > > > > > > Hi everyone,
>> >> > > > > > > > >
>> >> > > > > > > > > I would like to reboot the conversation around
>> TinkerPop
>> >> 4,
>> >> > > > > > > > > specifically
>> >> > > > > > > >
>> >> > > > > > > > as
>> >> > > > > > > > > it concerns the structure API. You will have seen my
>> >> posts,
>> >> > > ever
>> >> > > > > > > > > since my
>> >> > > > > > > > > presentation [1] last January, about an algebraic
>> >> approach to
>> >> > > > > > > > > property
>> >> > > > > > > > > graph schemas and transformations, which Ryan and I
>> >> > formalized
>> >> > > in
>> >> > > > > > > > > the APG
>> >> > > > > > > > > paper [2]. I am now very close to releasing the Haskell
>> >> > > > > > > > > implementation of
>> >> > > > > > > > > this framework as open source software (to be
>> accompanied
>> >> by
>> >> > an
>> >> > > > > > > > > Uber
>> >> > > > > > > > > Engineering Blog post, in the next few weeks if all
>> goes
>> >> > well).
>> >> > > > > > > > >
>> >> > > > > > > > > At various times and places, I have suggested that we
>> >> > develop a
>> >> > > > > > > >
>> >> > > > > > > > Scala-based
>> >> > > > > > > > > structure API for TP4 which implements APG in an
>> >> extensible
>> >> > > way.
>> >> > > > > > > > > I think
>> >> > > > > > > >
>> >> > > > > > > > it
>> >> > > > > > > > > is time to proceed and start committing code, or
>> discuss
>> >> > > > > > > > > alternative
>> >> > > > > > > >
>> >> > > > > > > > plans
>> >> > > > > > > > > for the structure API. There seems to be plenty of
>> >> community
>> >> > > > > > > > > interest,
>> >> > > > > > > >
>> >> > > > > > > > and
>> >> > > > > > > > > I now have an official OK to put some engineering hours
>> >> > towards
>> >> > > > > > > > > it at
>> >> > > > > > > >
>> >> > > > > > > > work.
>> >> > > > > > > > > I would like to align with you -- the TP PMC and other
>> >> > > TinkerPop
>> >> > > > > > > >
>> >> > > > > > > > committers
>> >> > > > > > > > > and developers -- on how to proceed, who will
>> contribute,
>> >> and
>> >> > > > > > > > > what the
>> >> > > > > > > > > development timeline will look like.
>> >> > > > > > > > >
>> >> > > > > > > > > Some specifics from my side:
>> >> > > > > > > > >
>> >> > > > > > > > >    - Graph.Features will carry over into TP4; it will
>> just
>> >> > be a
>> >> > > > > > > > > bit more
>> >> > > > > > > > >    sophisticated than the current TP3 Graph.Features.
>> >> Btw. I
>> >> > > also
>> >> > > > > > > >
>> >> > > > > > > > proposed
>> >> > > > > > > > >    this idea of a graph feature vector at the recent
>> >> Dagstuhl
>> >> > > > > > > > > Seminar
>> >> > > > > > > >
>> >> > > > > > > > [3],
>> >> > > > > > > > >    where it caught on and will be the basis of a
>> "dragon
>> >> data
>> >> > > > > > > > > model" that
>> >> > > > > > > > >    might help to keep TinkerPop aligned with upcoming
>> >> > standards
>> >> > > > > > > > > like RDF*
>> >> > > > > > > > > and
>> >> > > > > > > > >    GQL.
>> >> > > > > > > > >    - I feel we should use Scala for the API. This
>> opinion
>> >> is
>> >> > > > > > > > > informed by
>> >> > > > > > > >
>> >> > > > > > > > my
>> >> > > > > > > > >    experiences writing tools of this kind in both Java
>> and
>> >> > > > > > > > > Haskell at
>> >> > > > > > > >
>> >> > > > > > > > Uber.
>> >> > > > > > > > >    While I am a huge fan of Haskell, practical
>> >> considerations
>> >> > > > > > > > > rule it out
>> >> > > > > > > > > as
>> >> > > > > > > > >    an option. We need the API to be JVM-compatible. The
>> >> best
>> >> > > > > > > > > Haskell-JVM
>> >> > > > > > > > >    bridge in is Eta [4], but IMO it is not ready to be
>> >> put in
>> >> > > the
>> >> > > > > > > >
>> >> > > > > > > > critical
>> >> > > > > > > > >    path on a project such as TinkerPop; we used it at
>> Uber
>> >> > for
>> >> > > a
>> >> > > > > > > > > while
>> >> > > > > > > >
>> >> > > > > > > > and
>> >> > > > > > > > >    found it to be a time sink, despite the generated
>> >> bytecode
>> >> > > > > > > > > working
>> >> > > > > > > > > great.
>> >> > > > > > > > >    Likewise, I would strongly advise against continuing
>> >> with
>> >> > a
>> >> > > > > > > > > pure
>> >> > > > > > > > > Java-based
>> >> > > > > > > > >    API if we want to do intelligent things with graph
>> >> > schemas.
>> >> > > > > > > > > The
>> >> > > > > > > > > language is
>> >> > > > > > > > >    just not appropriate as a basis for the type system
>> in
>> >> > > > > > > > > question.
>> >> > > > > > > >
>> >> > > > > > > > Scala,
>> >> > > > > > > > > on
>> >> > > > > > > > >    the other hand, has all of the advantages of
>> Haskell in
>> >> > > terms
>> >> > > > > > > > > of type
>> >> > > > > > > > >    safety and functional pattern matching, although it
>> >> > requires
>> >> > > > > > > > > some
>> >> > > > > > > >
>> >> > > > > > > > extra
>> >> > > > > > > > >    discipline to keep your code pure.
>> >> > > > > > > > >    - Interoperability with Ryan's CQL (categorical
>> query
>> >> > > language
>> >> > > > > > > > > [5]) is
>> >> > > > > > > > >    of interest.
>> >> > > > > > > > >    - Interoperability with mm-ADT should be
>> >> straightforward
>> >> > now
>> >> > > > > > > > > that
>> >> > > > > > > >
>> >> > > > > > > > mm-ADT
>> >> > > > > > > > >    has support for union types. Hopefully, mm-ADT's
>> type
>> >> > system
>> >> > > > > > > > > will end
>> >> > > > > > > > > up as
>> >> > > > > > > > >    a proper superset of TP4's.
>> >> > > > > > > > >
>> >> > > > > > > > > Thoughts?
>> >> > > > > > > > >
>> >> > > > > > > > > Josh
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > [1]
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012
>> >> > > > > > > >
>> >> > > > > > > > > [2]
>> >> > > > > > > > > https://arxiv.org/abs/1909.04881
>> >> > > > > > > > >
>> >> > > > > > > > > [3]
>> >> > > > > > > > >
>> >> > https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19491
>> >> > > > > > > > >
>> >> > > > > > > > > [4]
>> >> > > > > > > > > https://eta-lang.org
>> >> > > > > > > > >
>> >> > > > > > > > > [5]
>> >> > > > > > > > > https://www.categoricaldata.net
>> >> > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>

[DISCUSS] Process API for TP4 [Was: structure API for TP4]

Reply via email to