Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Robert Dale Wed, 13 Jul 2016 14:35:55 -0700

On a different subject, I read on IBM's site that GraphSON 1.0
documents "are not valid JSON documents" [1].  Is this true? I looked
at one example and it did indeed look that way. It was an array of
vertices but without the array notation and not separated by ","


Was there a reason for this?  Please tell me GraphSON 2.0 will be valid JSON!

1. https://ibm-graph-docs.ng.bluemix.net/api.html#bulk-input-apis

On Wed, Jul 13, 2016 at 3:56 PM, Jason Plurad <plur...@gmail.com> wrote:
> Unipop uses String ids. Sqlg uses Long ids.
>
> Seems fair enough that we can compare ids as numeric by checking the
> graph.features() for supportsNumericIds(). One complication would be graphs
> that allow multiple id types.
>
>
> On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette <spmalle...@gmail.com>
> wrote:
>
>> > First, is there a wiki that we can keep updated with decisions or at
>> least
>> decision points? I know there's an old wiki, but is there/will there be a
>> new wiki?
>>
>> No - we don't have a wiki. Design decisions tend to get trapped in the
>> mailing list (or JIRA) which isn't so good. Maybe that's a separate
>> discussion.
>>
>> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>> properties.
>> It treats all types, primitive or object, from byte to long, double, float
>> as numbers.
>>
>> Perhaps we could take a stronger stance on this in the test cases? Does
>> anyone know what graphs this would impact besides Titan and TinkerGraph (I
>> suspect DSE Graph, but not 100% sure)?
>>
>>
>>
>> On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale <robd...@gmail.com> wrote:
>>
>> > First, is there a wiki that we can keep updated with decisions or at
>> > least decision points? I know there's an old wiki, but is there/will
>> > there be a new wiki?
>> >
>> > Stephen, IMO, that's still bad behavior. That says to me a number is
>> > not a number.  But, yes, schemaless does allow one to put crap in and
>> > get crap out. So designers should be aware of these types of pitfalls.
>> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
>> > properties. It treats all types, primitive or object, from byte to
>> > long, double, float as numbers.  This is pretty standard behavior in
>> > SQL, JDBC drivers, and other NoSQL technologies.
>> >
>> >
>> >
>> > On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette <spmalle...@gmail.com
>> >
>> > wrote:
>> > > Marko, the namespacing idea seems smart.
>> > >
>> > > Robert, I think other graphs have similar behavior to TinkerGraph's
>> > > default. In Titan, the absence of a schema (default, obviously)
>> produces
>> > > this:
>> > >
>> > > gremlin> graph =
>> TitanFactory.open('conf/titan-cassandra-es.properties')
>> > > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
>> > > gremlin> graph.addVertex("n",100D)
>> > > ==>v[4288]
>> > > gremlin> graph.traversal().V().has('n',100f)
>> > > gremlin> graph.traversal().V().has('n',100d)
>> > > ==>v[4288]
>> > >
>> > > This kind of problem has caused trouble for years and years in
>> TinkerPop
>> > > and allowing the type to be embedded seemed like a good solution. Of
>> > > course, you bring up a good point about javascript - to this point
>> we've
>> > > relied on JS devs to conform to java/groovy types by forcing conversion
>> > in
>> > > their gremlin scripts or configuring their graphs to avoid use of types
>> > > that would produce these kinds of ambiguous results.
>> > >
>> > >
>> > >
>> > > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale <robd...@gmail.com>
>> wrote:
>> > >
>> > >> And just to be clear, I'm not necessarily disagreeing. But I think
>> > >> it's important to understand where and why it's necessary.
>> > >>
>> > >> For example, if I'm writing a gremlin script (string), I don't type my
>> > >> input numbers.  It's rightly converted by the underlying architecture.
>> > >> (I'm guessing groovy which has enhanced number support).  Also, if a
>> > >> GLV is submitting typed numbers, how would that work? For example, in
>> > >> Javascript?
>> > >>
>> > >> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale <robd...@gmail.com>
>> wrote:
>> > >> > Hi, Stephen.  I think that's a bad example. You may recall I brought
>> > >> > up that issue in the forum.  However, it's actually attributed to
>> the
>> > >> > default ID manager of ANY (for historical) which I think is a really
>> > >> > bad default (and reason) because it only leads to confusion.  Java
>> is
>> > >> > one of the few, if not only, brain-damaged languages where 5 != 5 !=
>> > >> > 5.  In Java, number objects must be coerced into like form for
>> > >> > comparison. The other ID managers do this coercion.  Saner languages
>> > >> > do this under the covers.
>> > >> >
>> > >> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette <
>> > spmalle...@gmail.com>
>> > >> wrote:
>> > >> >> Robert, thanks for joining this discussion.
>> > >> >>
>> > >> >>> I wonder if it even makes sense to type numbers according to their
>> > >> >> memory model. As objects, Byte, Short, and Integer occupy the same
>> > >> >> space. Long isn't much more.  So in Java we're not saving much
>> space.
>> > >> >> Jackson will attempt to parse in order: int, long, BigInt,
>> > BigDecimal.
>> > >> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't
>> even
>> > >> >> have this concept.  Does anything in gremlin actually require this?
>> > >> >>
>> > >> >> If the intended numeric type isn't preserved, weird things can
>> happen
>> > >> with
>> > >> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph using
>> > the
>> > >> >> default ID manager will not be happy if you try to do a lookup of
>> > Long
>> > >> >> identifiers with an Integer:
>> > >> >>
>> > >> >> gremlin> graph = TinkerFactory.createModern()
>> > >> >> ==>tinkergraph[vertices:6 edges:6]
>> > >> >> gremlin> graph.vertices(1)
>> > >> >> ==>v[1]
>> > >> >> gremlin> graph.vertices(1L)
>> > >> >> gremlin>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale <robd...@gmail.com>
>> > wrote:
>> > >> >>
>> > >> >>> Marko, I agree that empty object properties should not be
>> > represented.
>> > >> >>> I think if you saw that in an example then it was probably for
>> > >> >>> demonstration purposes.
>> > >> >>>
>> > >> >>> Kevin, can you expand on this comment:
>> > >> >>>
>> > >> >>> > the format you suggest would lead to the same inconsistencies as
>> > in
>> > >> >>> GraphSON 1.0.
>> > >> >>> > Since the type is at the same level than the data itself,
>> whether
>> > the
>> > >> >>> container is an Array or an Object
>> > >> >>> >
>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
>> > >> >>>
>> > >> >>> What exactly are the inconsistencies?  What is the problem in
>> > >> >>> determining an array or object?
>> > >> >>> This is a natural JSON array (or list): []
>> > >> >>> This is a natural JSON object: {}
>> > >> >>>
>> > >> >>> Type at the object level is a common pattern and supported feature
>> > of
>> > >> >>> Jackson.  Also, GeoJSON would be a natural fit as it also stores
>> > >> >>> 'type' at the object level. Titan supports GeoJSON currently.  I
>> > >> >>> wonder if it would make sense to promote geometry to gremlin.
>> > >> >>>
>> > >> >>> We should probably start documenting a table of supported types.
>> (If
>> > >> >>> there is one, please provide link)
>> > >> >>>
>> > >> >>> I wonder if it even makes sense to type numbers according to their
>> > >> >>> memory model. As objects, Byte, Short, and Integer occupy the same
>> > >> >>> space. Long isn't much more.  So in Java we're not saving much
>> > space.
>> > >> >>> Jackson will attempt to parse in order: int, long, BigInt,
>> > BigDecimal.
>> > >> >>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't
>> even
>> > >> >>> have this concept.  Does anything in gremlin actually require
>> this?
>> > >> >>> I'm thinking that this is only going to be relevant at the domain
>> > >> >>> model level. This way json native numbers can be used and not need
>> > >> >>> typing.
>> > >> >>>
>> > >> >>> Additionally, I think that all things that will be typed should
>> > always
>> > >> >>> be typed. For the use cases of injesting a saved graph from a
>> file,
>> > it
>> > >> >>> can probably be assumed that the top-level objects are vertices
>> > since
>> > >> >>> the graph is vertex-centric and everything else follows naturally.
>> > >> >>> I'm not entirely sure what is required for submitting traversals
>> to
>> > >> >>> gremlin server from GLV.  However, if this is used for the results
>> > >> >>> from gremlin server then the results could start with any one of
>> > path,
>> > >> >>> vertex, edge, property, vertex property, etc. So you'll need that
>> > type
>> > >> >>> data there.
>> > >> >>>
>> > >> >>> --
>> > >> >>> Robert Dale
>> > >> >>>
>> > >> >>> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez <
>> > okramma...@gmail.com
>> > >> >
>> > >> >>> wrote:
>> > >> >>> > Hi,
>> > >> >>> >
>> > >> >>> > I’m not following this PR too closely so what I might be saying
>> > is a
>> > >> >>> already known/argued against/etc.
>> > >> >>> >
>> > >> >>> >         1. I think we should go with Robert Dale’s proposal of
>> > int32,
>> > >> >>> int64, Vertex, uuid, etc. instead of Java class names.
>> > >> >>> >         2. In Java we then have a Map<String,Class> for
>> > typecasting
>> > >> >>> accordingly.
>> > >> >>> >         3. This would make GraphSON 2.0 perfect for Bytecode
>> > >> >>> serialization in TINKERPOP-1278.
>> > >> >>> >         4. I think that if a Vertex, Edge, etc. doesn’t have
>> > >> properties,
>> > >> >>> outV, etc. then don’t even have those fields in the
>> representation.
>> > >> >>> >         5. Most of the serialization back and forth will be
>> > >> ReferenceXXX
>> > >> >>> elements and thus, don’t create more Maps/lists for no reason. —
>> > less
>> > >> chars.
>> > >> >>> >
>> > >> >>> > For me, my interests with this work is all about a language
>> > agnostic
>> > >> way
>> > >> >>> of sending Gremlin traversal bytecode between different languages.
>> > This
>> > >> >>> work is exactly what I am looking for.
>> > >> >>> >
>> > >> >>> > Thanks,
>> > >> >>> > Marko.
>> > >> >>> >
>> > >> >>> > http://markorodriguez.com
>> > >> >>> >
>> > >> >>> >
>> > >> >>> >
>> > >> >>> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette <
>> > spmalle...@gmail.com>
>> > >> >>> wrote:
>> > >> >>> >>
>> > >> >>> >> With all the work on GLVs and the recent work on GraphSON 2.0,
>> I
>> > >> think
>> > >> >>> it's
>> > >> >>> >> important that we have a solid, efficient, programming language
>> > >> neutral,
>> > >> >>> >> lossless serialization format. Right now that format is
>> GraphSON
>> > >> and it
>> > >> >>> >> works for that purpose (ever more  so with 2.0). Given some
>> > >> discussion
>> > >> >>> on
>> > >> >>> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
>> > >> >>> >>
>> > >> >>> >>
>> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
>> > >> >>> >>
>> > >> >>> >> I wonder if we shouldn't consider another IO format that has
>> > Gremlin
>> > >> >>> >> Server/GLVs in mind. At this point I'm not suggesting anything
>> > >> specific
>> > >> >>> -
>> > >> >>> >> I'm just hanging the idea out for further discussion and brain
>> > >> storming.
>> > >> >>> >> Thoughts?
>> > >> >>> >
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> --
>> > >> >>> Robert Dale
>> > >> >>>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Robert Dale
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Robert Dale
>> > >>
>> >
>> >
>> >
>> > --
>> > Robert Dale
>> >
>>



-- 
Robert Dale

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Reply via email to