Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Stephen Mallette Wed, 13 Jul 2016 14:44:33 -0700

i'll answer your second email first about GraphSON because it's shorter and
i know the answer without too much thought (i'll need to take some time to
think on the other).


So the answer to "is this true?" is yes and no. The "yes" part is related
to the fact that I believe that by default writeGraph() will generate the
"array of vertices" that document is referring to. The reason for this is
that the file generated needs to be arbitrarily splittable for processing
in hadoop/spark, so individual lines of valid JSON are used to accomplish
that. The "no" part is that if you want valid JSON you can get it by
configuring the GraphSONWriter to wrapAdjacencyList(true).  I don't think
that will change for GraphSON 2.0 unless there is an idea for dealing with
the hadoop/spark issue.

On Wed, Jul 13, 2016 at 5:35 PM, Robert Dale <[email protected]> wrote:

> On a different subject, I read on IBM's site that GraphSON 1.0
> documents "are not valid JSON documents" [1].  Is this true? I looked
> at one example and it did indeed look that way. It was an array of
> vertices but without the array notation and not separated by ","
>
> Was there a reason for this?  Please tell me GraphSON 2.0 will be valid
> JSON!
>
> 1. https://ibm-graph-docs.ng.bluemix.net/api.html#bulk-input-apis
>
> On Wed, Jul 13, 2016 at 3:56 PM, Jason Plurad <[email protected]> wrote:
> > Unipop uses String ids. Sqlg uses Long ids.
> >
> > Seems fair enough that we can compare ids as numeric by checking the
> > graph.features() for supportsNumericIds(). One complication would be
> graphs
> > that allow multiple id types.
> >
> >
> > On Wed, Jul 13, 2016 at 2:07 PM, Stephen Mallette <[email protected]>
> > wrote:
> >
> >> > First, is there a wiki that we can keep updated with decisions or at
> >> least
> >> decision points? I know there's an old wiki, but is there/will there be
> a
> >> new wiki?
> >>
> >> No - we don't have a wiki. Design decisions tend to get trapped in the
> >> mailing list (or JIRA) which isn't so good. Maybe that's a separate
> >> discussion.
> >>
> >> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> >> properties.
> >> It treats all types, primitive or object, from byte to long, double,
> float
> >> as numbers.
> >>
> >> Perhaps we could take a stronger stance on this in the test cases? Does
> >> anyone know what graphs this would impact besides Titan and TinkerGraph
> (I
> >> suspect DSE Graph, but not 100% sure)?
> >>
> >>
> >>
> >> On Wed, Jul 13, 2016 at 1:49 PM, Robert Dale <[email protected]> wrote:
> >>
> >> > First, is there a wiki that we can keep updated with decisions or at
> >> > least decision points? I know there's an old wiki, but is there/will
> >> > there be a new wiki?
> >> >
> >> > Stephen, IMO, that's still bad behavior. That says to me a number is
> >> > not a number.  But, yes, schemaless does allow one to put crap in and
> >> > get crap out. So designers should be aware of these types of pitfalls.
> >> > Neo4j via NeoGraph appears to do the right thing for vertex IDs and
> >> > properties. It treats all types, primitive or object, from byte to
> >> > long, double, float as numbers.  This is pretty standard behavior in
> >> > SQL, JDBC drivers, and other NoSQL technologies.
> >> >
> >> >
> >> >
> >> > On Wed, Jul 13, 2016 at 11:30 AM, Stephen Mallette <
> [email protected]
> >> >
> >> > wrote:
> >> > > Marko, the namespacing idea seems smart.
> >> > >
> >> > > Robert, I think other graphs have similar behavior to TinkerGraph's
> >> > > default. In Titan, the absence of a schema (default, obviously)
> >> produces
> >> > > this:
> >> > >
> >> > > gremlin> graph =
> >> TitanFactory.open('conf/titan-cassandra-es.properties')
> >> > > ==>standardtitangraph[cassandrathrift:[127.0.0.1]]
> >> > > gremlin> graph.addVertex("n",100D)
> >> > > ==>v[4288]
> >> > > gremlin> graph.traversal().V().has('n',100f)
> >> > > gremlin> graph.traversal().V().has('n',100d)
> >> > > ==>v[4288]
> >> > >
> >> > > This kind of problem has caused trouble for years and years in
> >> TinkerPop
> >> > > and allowing the type to be embedded seemed like a good solution. Of
> >> > > course, you bring up a good point about javascript - to this point
> >> we've
> >> > > relied on JS devs to conform to java/groovy types by forcing
> conversion
> >> > in
> >> > > their gremlin scripts or configuring their graphs to avoid use of
> types
> >> > > that would produce these kinds of ambiguous results.
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Jul 13, 2016 at 9:51 AM, Robert Dale <[email protected]>
> >> wrote:
> >> > >
> >> > >> And just to be clear, I'm not necessarily disagreeing. But I think
> >> > >> it's important to understand where and why it's necessary.
> >> > >>
> >> > >> For example, if I'm writing a gremlin script (string), I don't
> type my
> >> > >> input numbers.  It's rightly converted by the underlying
> architecture.
> >> > >> (I'm guessing groovy which has enhanced number support).  Also, if
> a
> >> > >> GLV is submitting typed numbers, how would that work? For example,
> in
> >> > >> Javascript?
> >> > >>
> >> > >> On Wed, Jul 13, 2016 at 9:16 AM, Robert Dale <[email protected]>
> >> wrote:
> >> > >> > Hi, Stephen.  I think that's a bad example. You may recall I
> brought
> >> > >> > up that issue in the forum.  However, it's actually attributed to
> >> the
> >> > >> > default ID manager of ANY (for historical) which I think is a
> really
> >> > >> > bad default (and reason) because it only leads to confusion.
> Java
> >> is
> >> > >> > one of the few, if not only, brain-damaged languages where 5 !=
> 5 !=
> >> > >> > 5.  In Java, number objects must be coerced into like form for
> >> > >> > comparison. The other ID managers do this coercion.  Saner
> languages
> >> > >> > do this under the covers.
> >> > >> >
> >> > >> > On Wed, Jul 13, 2016 at 8:56 AM, Stephen Mallette <
> >> > [email protected]>
> >> > >> wrote:
> >> > >> >> Robert, thanks for joining this discussion.
> >> > >> >>
> >> > >> >>> I wonder if it even makes sense to type numbers according to
> their
> >> > >> >> memory model. As objects, Byte, Short, and Integer occupy the
> same
> >> > >> >> space. Long isn't much more.  So in Java we're not saving much
> >> space.
> >> > >> >> Jackson will attempt to parse in order: int, long, BigInt,
> >> > BigDecimal.
> >> > >> >> The JSON JSR uses only BigDecimal. Some non-jvm languages don't
> >> even
> >> > >> >> have this concept.  Does anything in gremlin actually require
> this?
> >> > >> >>
> >> > >> >> If the intended numeric type isn't preserved, weird things can
> >> happen
> >> > >> with
> >> > >> >> graphs that have a schema (like Titan/DSE). Even TinkerGraph
> using
> >> > the
> >> > >> >> default ID manager will not be happy if you try to do a lookup
> of
> >> > Long
> >> > >> >> identifiers with an Integer:
> >> > >> >>
> >> > >> >> gremlin> graph = TinkerFactory.createModern()
> >> > >> >> ==>tinkergraph[vertices:6 edges:6]
> >> > >> >> gremlin> graph.vertices(1)
> >> > >> >> ==>v[1]
> >> > >> >> gremlin> graph.vertices(1L)
> >> > >> >> gremlin>
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >> On Wed, Jul 13, 2016 at 8:17 AM, Robert Dale <[email protected]
> >
> >> > wrote:
> >> > >> >>
> >> > >> >>> Marko, I agree that empty object properties should not be
> >> > represented.
> >> > >> >>> I think if you saw that in an example then it was probably for
> >> > >> >>> demonstration purposes.
> >> > >> >>>
> >> > >> >>> Kevin, can you expand on this comment:
> >> > >> >>>
> >> > >> >>> > the format you suggest would lead to the same
> inconsistencies as
> >> > in
> >> > >> >>> GraphSON 1.0.
> >> > >> >>> > Since the type is at the same level than the data itself,
> >> whether
> >> > the
> >> > >> >>> container is an Array or an Object
> >> > >> >>> >
> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231351653
> >> > >> >>>
> >> > >> >>> What exactly are the inconsistencies?  What is the problem in
> >> > >> >>> determining an array or object?
> >> > >> >>> This is a natural JSON array (or list): []
> >> > >> >>> This is a natural JSON object: {}
> >> > >> >>>
> >> > >> >>> Type at the object level is a common pattern and supported
> feature
> >> > of
> >> > >> >>> Jackson.  Also, GeoJSON would be a natural fit as it also
> stores
> >> > >> >>> 'type' at the object level. Titan supports GeoJSON currently.
> I
> >> > >> >>> wonder if it would make sense to promote geometry to gremlin.
> >> > >> >>>
> >> > >> >>> We should probably start documenting a table of supported
> types.
> >> (If
> >> > >> >>> there is one, please provide link)
> >> > >> >>>
> >> > >> >>> I wonder if it even makes sense to type numbers according to
> their
> >> > >> >>> memory model. As objects, Byte, Short, and Integer occupy the
> same
> >> > >> >>> space. Long isn't much more.  So in Java we're not saving much
> >> > space.
> >> > >> >>> Jackson will attempt to parse in order: int, long, BigInt,
> >> > BigDecimal.
> >> > >> >>> The JSON JSR uses only BigDecimal. Some non-jvm languages don't
> >> even
> >> > >> >>> have this concept.  Does anything in gremlin actually require
> >> this?
> >> > >> >>> I'm thinking that this is only going to be relevant at the
> domain
> >> > >> >>> model level. This way json native numbers can be used and not
> need
> >> > >> >>> typing.
> >> > >> >>>
> >> > >> >>> Additionally, I think that all things that will be typed should
> >> > always
> >> > >> >>> be typed. For the use cases of injesting a saved graph from a
> >> file,
> >> > it
> >> > >> >>> can probably be assumed that the top-level objects are vertices
> >> > since
> >> > >> >>> the graph is vertex-centric and everything else follows
> naturally.
> >> > >> >>> I'm not entirely sure what is required for submitting
> traversals
> >> to
> >> > >> >>> gremlin server from GLV.  However, if this is used for the
> results
> >> > >> >>> from gremlin server then the results could start with any one
> of
> >> > path,
> >> > >> >>> vertex, edge, property, vertex property, etc. So you'll need
> that
> >> > type
> >> > >> >>> data there.
> >> > >> >>>
> >> > >> >>> --
> >> > >> >>> Robert Dale
> >> > >> >>>
> >> > >> >>> On Tue, Jul 12, 2016 at 8:35 AM, Marko Rodriguez <
> >> > [email protected]
> >> > >> >
> >> > >> >>> wrote:
> >> > >> >>> > Hi,
> >> > >> >>> >
> >> > >> >>> > I’m not following this PR too closely so what I might be
> saying
> >> > is a
> >> > >> >>> already known/argued against/etc.
> >> > >> >>> >
> >> > >> >>> >         1. I think we should go with Robert Dale’s proposal
> of
> >> > int32,
> >> > >> >>> int64, Vertex, uuid, etc. instead of Java class names.
> >> > >> >>> >         2. In Java we then have a Map<String,Class> for
> >> > typecasting
> >> > >> >>> accordingly.
> >> > >> >>> >         3. This would make GraphSON 2.0 perfect for Bytecode
> >> > >> >>> serialization in TINKERPOP-1278.
> >> > >> >>> >         4. I think that if a Vertex, Edge, etc. doesn’t have
> >> > >> properties,
> >> > >> >>> outV, etc. then don’t even have those fields in the
> >> representation.
> >> > >> >>> >         5. Most of the serialization back and forth will be
> >> > >> ReferenceXXX
> >> > >> >>> elements and thus, don’t create more Maps/lists for no reason.
> —
> >> > less
> >> > >> chars.
> >> > >> >>> >
> >> > >> >>> > For me, my interests with this work is all about a language
> >> > agnostic
> >> > >> way
> >> > >> >>> of sending Gremlin traversal bytecode between different
> languages.
> >> > This
> >> > >> >>> work is exactly what I am looking for.
> >> > >> >>> >
> >> > >> >>> > Thanks,
> >> > >> >>> > Marko.
> >> > >> >>> >
> >> > >> >>> > http://markorodriguez.com
> >> > >> >>> >
> >> > >> >>> >
> >> > >> >>> >
> >> > >> >>> >> On Jul 9, 2016, at 9:48 AM, Stephen Mallette <
> >> > [email protected]>
> >> > >> >>> wrote:
> >> > >> >>> >>
> >> > >> >>> >> With all the work on GLVs and the recent work on GraphSON
> 2.0,
> >> I
> >> > >> think
> >> > >> >>> it's
> >> > >> >>> >> important that we have a solid, efficient, programming
> language
> >> > >> neutral,
> >> > >> >>> >> lossless serialization format. Right now that format is
> >> GraphSON
> >> > >> and it
> >> > >> >>> >> works for that purpose (ever more  so with 2.0). Given some
> >> > >> discussion
> >> > >> >>> on
> >> > >> >>> >> the GraphSON 2.0 PR driven a bit by Robert Dale:
> >> > >> >>> >>
> >> > >> >>> >>
> >> > https://github.com/apache/tinkerpop/pull/351#issuecomment-231157389
> >> > >> >>> >>
> >> > >> >>> >> I wonder if we shouldn't consider another IO format that has
> >> > Gremlin
> >> > >> >>> >> Server/GLVs in mind. At this point I'm not suggesting
> anything
> >> > >> specific
> >> > >> >>> -
> >> > >> >>> >> I'm just hanging the idea out for further discussion and
> brain
> >> > >> storming.
> >> > >> >>> >> Thoughts?
> >> > >> >>> >
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> --
> >> > >> >>> Robert Dale
> >> > >> >>>
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > --
> >> > >> > Robert Dale
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Robert Dale
> >> > >>
> >> >
> >> >
> >> >
> >> > --
> >> > Robert Dale
> >> >
> >>
>
>
>
> --
> Robert Dale
>

Re: [DISCUSS] New IO format for GLVs/Gremlin Server

Reply via email to