Re: GraphSON improvement proposal

Stephen Mallette Mon, 25 Apr 2016 12:16:45 -0700

I assume we have some consensus here as this thread has been going for a
while now and there seems to be general agreement that a change of this
type would be useful. I created a JIRA ticket to track this work here:


https://issues.apache.org/jira/browse/TINKERPOP-1274

On Thu, Apr 21, 2016 at 11:06 AM, Stephen Mallette <spmalle...@gmail.com>
wrote:

> +1 for No type / Non native types only / All types.
> +1 for stamping out a new version of GraphSON (though i forget what i had
> planned to allow that to work)
>
>
> On Thu, Apr 21, 2016 at 10:56 AM, Kevin Gallardo <
> gallardo.kev...@gmail.com> wrote:
>
>> I agree this change would imply to stamp out a new version of GraphSON
>> because it breaks compatibility for types embedded anyway, I don't see how
>> to workaround that.. Having an option on the current version to make both
>> compatible would be very prone to errors, might as well mark a clear
>> evolution with a new version, what do you think?
>>
>> I think it would be worth having only 3 settings : No type / Non native
>> types only / All types. Looks more straightforward imo.
>>
>> Cheers.
>>
>> On 2016-04-11 18:04, Stephen Mallette <s...@gmail.com> wrote:
>> > It would be nice to get to some consensus on this thread so we could
>> put>
>> > something out for review on Gremlin Users and ultimately VOTE. I've>
>> > gathered from this thread that:>
>> >
>> > 1. It seems like there is some consensus on typing JSON non-natives
>> only>
>> > (as opposed to everything). Embedding a type for existing JSON types
>> would>
>> > be more consistent, but it seems redundant. I suppose we could hang
>> that>
>> > out there for public opinion on the gremlin-users list as i'm not sure
>> what>
>> > I would like here.  I'm personally not a consumer of GraphSON with
>> embedded>
>> > types (especially off the jvm) so I'd rely on the drivers devs and the>
>> > greater user community to guide the vote i cast there.>
>> > 2. The general structure of the revised GraphSON that Kevin proposed
>> seems>
>> > to be generally accepted and I think using the non-canonical name for
>> the>
>> > types makes sense, though I suppose we leave open the chance for
>> collision>
>> > under that model. Perhaps there is a configuration we could add there.>
>> > 3. Dylan wondered: "It still feels a little too half hearted to warrant
>> a>
>> > new serializer version though?" I don't see how we can't stamp out a
>> new>
>> > version of GraphSON if we do this. The format for embedded types is>
>> > changing (untyped will stay the same). Unless there is a way to make
>> this a>
>> > configuration of the current version so that it can easily read both>
>> > formats? maybe this is all just configuration of the same GraphSON
>> format?>
>> > you have the following configuration options:>
>> >
>> > a. JSON native>
>> > b. full embedded type with canonical name>
>> > c. full embedded type with simple name>
>> > b. partial embedded type with canonical name>
>> > c. partial embedded type with simple name>
>> >
>> > ....hmm even with those options you break version because both b and c>
>> > options (which would be represent complete non-lossy approach wouldn't>
>> > match the format we have now (as the current format doesn't type json>
>> > primitives) anyway, open to ideas on how we could do this without
>> having to>
>> > version GraphSON....>
>> >
>> >
>> >
>> > On Sat, Apr 9, 2016 at 5:04 PM, Dylan Millikin <dy...@gmail.com>>
>> > wrote:>
>> >
>> > > The double map comment if I recall correctly was in the case of
>> typing>
>> > > everything (even JSON natives). So typing a map would be the actual
>> case.>
>> > >>
>> > > At this point I think it would be good to make sure whichever model
>> we pick>
>> > > prevents as much type loss as possible before moving forward.
>> Consistency +>
>> > > Java classes + JSON non-natives is definitely a convenience but we
>> might as>
>> > > well go the extra mile if we're going to add a v2 serializer (if - as
>> it>
>> > > should go through a DISCUSS/VOTE).>
>> > >>
>> > > On Thu, Apr 7, 2016 at 3:57 PM, Kevin Gallardo <ga...@gmail.com>>
>> > > wrote:>
>> > >>
>> > > > > - List, like suggested in the original thread. Pros: simple.
>> Cons: if>
>> > > > the>>
>> > > > > generate List has two maps your un-serializer will need to look
>> ahead>
>> > > > into>>
>> > > > > the maps to see if one has a "@class" key as we can't guaranty
>> List>
>> > > > order.>>
>> > > >>
>> > > > Sorry but I don't see, with the model proposed, how can the List
>> contain>
>> > > 2>
>> > > > maps with one being the map with the type, I can't manage to find
>> an>
>> > > > example, but if you have one in mind, please let me know. Moreover,
>> why>
>> > > > can't we guaranty the order? GraphSON serializers methods
>> explicitly call>
>> > > > the type serializers before serializing the values, and since it is
>> a>
>> > > list,>
>> > > > the order is conserved.>
>> > > > I am not against the Map approach, but I don't see a valid reason
>> not to>
>> > > > use the list, and it seems less verbose..>
>> > > >>
>> > > > I agree with your conclusion, seems like a reasonable decision for
>> Java>
>> > > > classes + JSON non-natives. Just to clarify, you mention overhead
>> for the>
>> > > > server serializers/GraphSONWriter with Generic names, I am not sure
>> it>
>> > > > applies though, the intention originally would be to encode only
>> the>
>> > > Class>
>> > > > name rather than the canonical name. i.e. for example from>
>> > > > "java.time.Instant" to "Instant", and this does not imply overhead
>> on the>
>> > > > serialization process.>
>> > > >>
>> > > > > > I'd ask for the same question concerning typing on natively
>> supported>
>> > > > JSON types, is the overhead worth it?>
>> > > > > I think that there have been a number of struggles for non-jvm>
>> > > languages>
>> > > > around discerning float/double and int/long with respect to
>> graphson.>
>> > > >>
>> > > > Indeed, with this proposal, since the typing would work both ways>
>> > > > (ser/de), the GraphSONReader would be able to coerce automatically
>> an>
>> > > > Element having non-JSON-native properties into the correct type,
>> from>
>> > > > reading the correctly typed JSON.>
>> > > >>
>> > > > Thanks for your answer.>
>> > > >>
>> > > > On 2016-04-05 16:03, Dylan Millikin <d....@gmail.com> wrote:>
>> > > > > At this point I'm more or less neutral here. As a recap, I guess
>> there>
>> > > > are>>
>> > > > > two points:>>
>> > > > >>
>> > > > > 1) Making the graphSON typing more homogenous across types (Map
>> vs List>
>> > > > vs>>
>> > > > > Long), no special cases. Here the values could either be
>> encapsulated>
>> > > in>
>> > > > :>>
>> > > > > - List, like suggested in the original thread. Pros: simple.
>> Cons: if>
>> > > > the>>
>> > > > > generate List has two maps your un-serializer will need to look
>> ahead>
>> > > > into>>
>> > > > > the maps to see if one has a "@class" key as we can't guaranty
>> List>
>> > > > order.>>
>> > > > > It doesn't imply much overhead though.>>
>> > > > > - Map. Pros: more thorough / theoretically a tad faster on the
>> client>
>> > > > end.>>
>> > > > > Cons: more verbose, less readable.>>
>> > > > >>
>> > > > > 2) What to type and how to type it (specific Java class vs
>> generic>
>> > > > naming,>>
>> > > > > etc.),>>
>> > > > >>
>> > > > > How to print the types:>>
>> > > > >>
>> > > > > - As java classes, Pros: Are self explanatory, native to the>
>> > > serializing>>
>> > > > > end so no overhead. Cons: implies systematic overhead on the
>> client end>
>> > > > for>>
>> > > > > non-jvm mapping.>>
>> > > > > - As generic names, Pros: non-jvm mapping can cherry pick which
>> types>
>> > > it>>
>> > > > > needs to map so less overhead. Cons: specific documentation is>
>> > > required,>
>> > > > i>>
>> > > > > guess that theoretically there's more overhead for the server
>> end>>
>> > > > > serializer as well.>>
>> > > > >>
>> > > > > What to type:>>
>> > > > >>
>> > > > > - Everything. Pros: if added to 1) it would make GraphSON very>
>> > > > homogenous>>
>> > > > > and fail proof. Cons: overhead overhead overhead (need to figure
>> out>
>> > > if>>
>> > > > > it's significant or not)>>
>> > > > > - JSON non-natives only. Pros: less overhead, simplifies the
>> current>>
>> > > > > implementation. Cons: potentially not as safe. Though I suspect
>> that>
>> > > any>>
>> > > > > language supporting JSON should have workarounds for some of the>
>> > > > pitfalls.>>
>> > > > > So probably a non-issue. Doesn't help in solving any of the
>> int/long>
>> > > and>>
>> > > > > float/double issues though.>>
>> > > > >>
>> > > > >>
>> > > > > Conclusion:>>
>> > > > >>
>> > > > > Looking at the above I guess that implementing 1) (either choice)
>> and>
>> > > 2)>>
>> > > > > Java classes + JSON non-natives. Provides a simpler and easier to
>> use>>
>> > > > > GraphSON representation with equal usability to the current>
>> > > > implementation.>>
>> > > > > It still feels a little too half hearted to warrant a new
>> serializer>>
>> > > > > version though?>>
>> > > > >>
>> > > > > On Tue, Apr 5, 2016 at 4:07 PM, Stephen Mallette <sp...@gmail.com>>>
>>
>> > > > > wrote:>>
>> > > > >>
>> > > > > > > I'd ask for the same question concerning typing on natively>
>> > > > supported>>
>> > > > > > JSON types, is the overhead worth it?>>
>> > > > > >>>
>> > > > > > I think that there have been a number of struggles for non-jvm>
>> > > > languages>>
>> > > > > > around discerning float/double and int/long with respect to>
>> > > graphson.>>
>> > > > > > That's been true of TinkerPop 2.x and 3.x.  We did a better
>> job in>
>> > > 3.x>
>> > > > by>>
>> > > > > > pushing some of the responsibility for id parsing (for graphs
>> that>
>> > > use>>
>> > > > > > numeric types for ids) down to the Graph implementation level
>> so that>
>> > > > has>>
>> > > > > > been helpful, but there's nothing for situations like:>>
>> > > > > >>>
>> > > > > > v.property("someFloatField", x)>>
>> > > > > >>>
>> > > > > > we typically recommend that users coerce values as:>>
>> > > > > >>>
>> > > > > > v.property("someFloatField", Float.valueOf(x))>>
>> > > > > >>>
>> > > > > > or just don't use float in your schema.  I'm not saying the
>> overhead>
>> > > > of>>
>> > > > > > this is worth what we'd get, but just pointing out that there
>> is a>
>> > > > commonly>>
>> > > > > > seen problem to solve here. It might be fine to stick with our>
>> > > current>>
>> > > > > > advice on implementations patterns for this. We might want to
>> raise>
>> > > > this>>
>> > > > > > issue on gremlin-users for more feedback once we have better>
>> > > consensus>>
>> > > > > > here.>>
>> > > > > >>>
>> > > > > >>>
>> > > > > >>>
>> > > > > > On Mon, Apr 4, 2016 at 12:03 PM, Kevin Gallardo <>>
>> > > > > > kevin.galla...@datastax.com>>
>> > > > > > > wrote:>>
>> > > > > >>>
>> > > > > > > Thank you for your answers.>>
>> > > > > > >>>
>> > > > > > > I agree with the reasoning of needing the full types names
>> when it>
>> > > > comes>>
>> > > > > > > to check the specification, however after you saw the specs
>> and you>
>> > > > know>>
>> > > > > > > how to deserialize it, is it worth it to still get the full
>> types>
>> > > in>
>> > > > all>>
>> > > > > > > the server's responses afterwards?>>
>> > > > > > >>>
>> > > > > > > I'd ask for the same question concerning typing on natively>
>> > > > supported>>
>> > > > > > JSON>>
>> > > > > > > types, is the overhead worth it? If you only want to be able
>> to>>
>> > > > > > distinguish>>
>> > > > > > > non native types from the native ones.>>
>> > > > > > > It could also maybe be possible to have some kinds of "level"
>> for>
>> > > > typing>>
>> > > > > > > then, "no type", "only custom types", "all types". What do
>> you>
>> > > > think? (I>>
>> > > > > > am>>
>> > > > > > > not sure how typing native JSON elements is supported by
>> Jackson>
>> > > > though).>>
>> > > > > > >>>
>> > > > > > > Cheers.>>
>> > > > > > >>>
>> > > > > > > On 2016-04-01 14:56, Dylan Millikin <d....@gmail.com>
>> wrote:>>
>> > > > > > > > Ok I see. That might be a little redundant but it
>> definitely has>
>> > > > it's>>>
>> > > > > > > > advantages.>>>
>> > > > > > > > My initial thought was mostly like Kevin's regarding JSON
>> native>
>> > > > types.>>
>> > > > > > > I>>>
>> > > > > > > > had a few reservations regarding the type naming and I've
>> given>
>> > > > this a>>
>> > > > > > > bit>>>
>> > > > > > > > more thought :>>>
>> > > > > > > >>>
>> > > > > > > > I think the java specific typing is a good thing because
>> it>
>> > > > directly>>>
>> > > > > > > > references those types as being java types (with those
>> specs).>
>> > > > Typing>>
>> > > > > > in>>>
>> > > > > > > > other languages do not follow any standard, "int" can mean>
>> > > > anything>>
>> > > > > > > ranging>>>
>> > > > > > > > from "short" to "long" to nothing depending on your
>> language and>>
>> > > > > > system.>>>
>> > > > > > > > PHP is horrible for this, int = int on 32b systems and int
>> = long>
>> > > > on>>
>> > > > > > 64b>>>
>> > > > > > > > systems, and I believe other languages also have similar
>> issues>
>> > > > (the>>
>> > > > > > > Ariane>>>
>> > > > > > > > space program notoriously lost one of their rockets to this
>> kind>
>> > > > of>>
>> > > > > > > typing>>>
>> > > > > > > > issue). So drivers/clients -should- map the types anyways
>> and>
>> > > the>>
>> > > > > > > current>>>
>> > > > > > > > implementation makes it easy to understand where to go for>
>> > > specs.>>>
>> > > > > > > >>>
>> > > > > > > > If anything, considering the above, you may actually want
>> to type>
>> > > > JSON>>>
>> > > > > > > > native types as well to make this completely fail-proof.>>>
>> > > > > > > >>>
>> > > > > > > > This goes in the completely opposite direction of what the
>> OP>>
>> > > > > > suggested>>>
>> > > > > > > > though. Thoughts?>>>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > > On Fri, Apr 1, 2016 at 3:17 PM, Stephen Mallette <>
>> > > sp...@gmail.com>
>> > > > >>>>
>> > > > > > > > wrote:>>>
>> > > > > > > >>>
>> > > > > > > > > I haven't yet had a chance to fully think this change
>> through,>
>> > > > but I>>
>> > > > > > > think>>>
>> > > > > > > > > the idea would be to produce a new version of GraphSON
>> and keep>
>> > > > the>>
>> > > > > > > old>>>
>> > > > > > > > > version around for backward compatibility. It would also
>> only>
>> > > be>
>> > > > a>>
>> > > > > > > breaking>>>
>> > > > > > > > > change for type embedded GraphSON, which i would guess
>> the>
>> > > > minority>>
>> > > > > > > are>>>
>> > > > > > > > > using right now as its verbosity and dependence on java
>> types>
>> > > > makes>>
>> > > > > > it>>>
>> > > > > > > > > borderline unusable  off the JVM.  I think this is a
>> nice>
>> > > > proposal>>
>> > > > > > > because>>>
>> > > > > > > > > it tries to make embedded types something that could
>> actually>
>> > > be>>
>> > > > > > > usable in>>>
>> > > > > > > > > a more general fashion across different programming
>> languages.>>
>> > > > > > Anyway,>>
>> > > > > > > the>>>
>> > > > > > > > > key point is that we would want to try to still support
>> the>
>> > > old>>
>> > > > > > > version of>>>
>> > > > > > > > > type-embedded graphson in addition to what kevin has>
>> > > proposed.>>>
>> > > > > > > > >>>>
>> > > > > > > > > On Fri, Apr 1, 2016 at 9:11 AM, Dylan Millikin <>
>> > > dy...@gmail.com>
>> > > > >>>>
>> > > > > > > > > wrote:>>>
>> > > > > > > > >>>>
>> > > > > > > > > > I also thought about this when I tried to set this
>> serializer>
>> > > > up>>
>> > > > > > > with the>>>
>> > > > > > > > > > php driver. This is quite a breaking change though.>>>
>> > > > > > > > > >>>>
>> > > > > > > > > > On Thu, Mar 31, 2016 at 1:33 PM, Kevin Gallardo <>>>
>> > > > > > > > > > kevin.galla...@datastax.com>>>
>> > > > > > > > > > > wrote:>>>
>> > > > > > > > > >>>>
>> > > > > > > > > > >  Hi,>>>
>> > > > > > > > > > >>>>
>> > > > > > > > > > >>>>
>> > > > > > > > > > > By working closely with Tinkerpop I have been
>> interacting>
>> > > > quite a>>
>> > > > > > > bit>>>
>> > > > > > > > > > with>>>
>> > > > > > > > > > > the GraphSON format and I have come up with the need
>> of>
>> > > > using>>
>> > > > > > the>>>
>> > > > > > > > > > > "embedTypes" option of the GraphSON Mapper to get
>> more>
>> > > > insight of>>
>> > > > > > > the>>>
>> > > > > > > > > > types>>>
>> > > > > > > > > > > of the data I was transferring through the GraphSON>
>> > > > payload.>>>
>> > > > > > > > > > >>>>
>> > > > > > > > > > > By default vertices, edges, having properties that
>> have>
>> > > > another>>
>> > > > > > > type>>>
>> > > > > > > > > than>>>
>> > > > > > > > > > > the natively supported types in the JSON format are
>> encoded>
>> > > > as>>
>> > > > > > > their>>>
>> > > > > > > > > > String>>>
>> > > > > > > > > > > representation. Meaning that a Vertex property being
>> a UUID>
>> > > > will>>
>> > > > > > > look>>>
>> > > > > > > > > > like>>>
>> > > > > > > > > > > "24B7DED2-F72A-11E5-815D-79DF61FECC63" which, for the
>> user>>
>> > > > > > > consuming>>>
>> > > > > > > > > the>>>
>> > > > > > > > > > > JSON produced by the GraphSONWriter, will not be>
>> > > > distinguishable>>
>> > > > > > > from>>>
>> > > > > > > > > > being>>>
>> > > > > > > > > > > a String or a UUID. That is usually why you need to>
>> > > activate>
>> > > > the>>
>> > > > > > > type>>>
>> > > > > > > > > > > embedding of GraphSON. Documentation about the
>> current>
>> > > > output of>>>
>> > > > > > > > > GraphSON>>>
>> > > > > > > > > > > typing :>>>
>> > > > > > > > > > >>>>
>> > > > > > > > > > >>>>
>> > > > > > > > > >>>>
>> > > > > > > > >>>
>> > > > > > >>>
>> > > > > >>
>> > > >>
>> > >
>> http://tinkerpop.apache.org/docs/3.1.1-incubating/reference/#graphson-reader-writer>
>>
>> > > > >>
>> > > > > > > >>>
>> > > > > > > > > > >>>>
>> > > > > > > > > > > However, right now, embedding types with GraphSON
>> comes>
>> > > with>
>> > > > a>>
>> > > > > > few>>>
>> > > > > > > > > trade>>>
>> > > > > > > > > > > offs :>>>
>> > > > > > > > > > >>>>
>> > > > > > > > > > >    -
>> [message truncated...]
>>
>>
>> —
>> Kevin Gallardo
>> kgdo.me
>>
>>
>

Re: GraphSON improvement proposal

Reply via email to