To try to go back to the original discussion:

> Identify types that are problematic and that we therefore don't want
to support across all GLVs.

I made a proposal above. It's based on the type representation rationale, it's
important to think about possible/common in-memory or disk representation
of a type (ie: two compliment for signed integers, IEEE 754 for doubles,
...) when you decide whether you support it or not (ie: is it useful for a
TinkerPop implementation?), I didn't want to bring storage in the mix as
it's a responsibility of the vendor/implementor of the storage engine.
We can go for a minimal set of types or a more broader set, but I agree
with Florian that it should be defined and documented.

Is anyone proposing anything else? Should we only support Core types on the
GLV?


2018-05-29 23:30 GMT+02:00 Robert Dale <robd...@gmail.com>:

> Florian wrote:
> > I don't really see why we should drop them when they make the storage
> more efficient and reduce the need for type castings in user code.
>
> Where is it more efficient?  I think that statement a specialization that's
> not generally applicable.
> a.  At the client?  This is language dependent. Many high-level languages
> actually have only 2 number types: integer, float. My guess is that they
> occupy the same space for the most part. In any case, they would ignore any
> size hints.
> b. At GraphSON?  The text size is dependent on actuall length of characters
> and not the memory storage size. e.g. a '0' byte is the same size as a '0'
> long.
> c. In the TinkerPop Java implementation? In Java, all of the numbers are
> stored as Objects. Byte, Short, Integer all occupy 16 bytes. Long is 24
> bytes. There's really not a substantial difference.
> d. In the database?  Yes, but I started out with Gremlin should be
> schema-agnostic. And all modern databases I've worked with will handle the
> conversion to storage size for you.  Are there any that don't? Should we be
> concerned about it?
>
> I've pulled some select phrases from Python's rationale to unify integers:
>
> "Having the machine word size exposed to the language hinders portability."
>
> "There is also the general desire to hide unnecessary details from the
> [...] user when they are irrelevant for most applications. An example is
> memory allocation, [...] giving us the convenience of unlimited sizes on
> strings, lists, etc. It makes sense to extend this convenience to numbers."
>
>  - https://www.python.org/dev/peps/pep-0237/
>
> So, take the inverse problem of the client where in python 3, for example,
> there is only one integer type.  In the Gremlin client, it has undone what
> python 3 did.  That is, the Gremlin python client adds cast functions for
> int, long, float.  And so it goes for every client out there where you want
> to expose memory size. This goes back to client/server symmetry.  And why
> does GraphSON arbitrarily stop at Int32,64? Why are 1, 8, 16 extended
> types?  Let's just do away with memory size. Besides, all the cool kids are
> doing it. What more do you need than that?
>
> Florian wrote:
> > For Date, you say that it's just a long. Sure, but how does the receiver
> know that the long should be deserialized to a Date in this case? As a user
> I want to work with a Date object and not just with a long. Also, we
> nevertheless need a convention of what this long represents: Milliseconds
> since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since December
> 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is
> basically just this convention which is why I would keep it.
>
> I think you're conflating types with intent. How does the receiver know
> what anything is really?  That's an application detail.  Aside from that,
> your argument falls apart exactly for the reasons given.  g:Date is not
> well-defined.  You are implicitly using Java's definition of Date.  If we
> take your example , what if you were also interacting with a Gremlin Server
> written C++ 7.0?  Now all your g:Dates are off by 71 years.
>
> From a Gremlin perspective, currently there is no first-class support for
> Date , it's just a pass-through type. In order to do anything with it, you
> would have to use an other-language lambda.  I'm not against temporal
> types, just IMO, this is not sufficiently supported in Gremlin to be a
> supported type.  So if the long-term goal for Gremlin is to get away from
> lambdas and be language agnostic, then it needs the date operators and
> functions in order to support a date type.
>
> Again, provide the use cases. Demonstrate why it's required. Provide the
> operators/functions.
>
>
> Stephen wrote:
> >  I just wonder
> exactly how graphs that don't ' have schemas like neo4j/tinkergraph will
> deal with someone sending a "Number". What happens in that case?
>
> Neo4j actually supports only two number types, integer and float, just as
> proposed here.
> https://neo4j.com/docs/developer-manual/current/drivers/cypher-values/
>
>
> Robert Dale
>
>
> On Tue, May 29, 2018 at 12:45 PM, Stephen Mallette <spmalle...@gmail.com>
> wrote:
>
> > >  Regarding serialization and deserialization asymmetry on GLVs (for
> Core
> > and Extended types), I think we should avoid it as it could lead to
> > obscure error
> > messages on the user side.
> >
> > In the past, I think TinkerPop (going back to 2.x) has been ok with it
> and
> > I'm not so sure that I recall any specific problems that were every
> voiced
> > by users on the subject. As it stands, I think we already have some
> > asymmetry in gremlin-python so there is some precedent for it. There are
> > also some cases where it logically makes some sense to safely have one
> but
> > not the other. A GLV likely doesn't need to support a Bytecode
> deserializer
> > because it doesn't receive bytecode from the server. It only needs to
> send
> > bytecode and thus only has a serializer - at least until we have GVMs
> > instead of GLVs :)  Does that change your thinking at all Jorge?
> >
> > >   First would be: Gremlin should not concern itself with storage
> > schemas.....
> >
> > I like all of Robert's first paragraph because it makes Jorge's binary
> > format proposal that much easier to get right. JanusGraph, DSE Graph and
> > others won't have any trouble with this approach because the backend will
> > simply know that the particular property that this number is going into
> > will be a float and will coerce it as such on storage. I just wonder
> > exactly how graphs that don't ' have schemas like neo4j/tinkergraph will
> > deal with someone sending a "Number". What happens in that case?
> >
> > On Mon, May 28, 2018 at 4:20 AM, Florian Hockmann <
> f...@florian-hockmann.de>
> > wrote:
> >
> > > > these should be dropped: Class (unless this is used for something
> > > important? Too many results on 'Class'
> > > in the codebase.
> > >
> > > 'Class' is for example used for 'withoutStrategies' but I agree that
> this
> > > would probably better handled just as a string. 'Class' is
> Java-specific
> > > which doesn't make much sense when graph providers want to implement
> > > TinkerPop in another language than Java.
> > >
> > > Apart from that, I'm not sure I get your reasoning behind dropping
> types
> > > like Date, Int32, and float. It's really trivial in most languages to
> add
> > > serializers for more numerical types so I don't really see why we
> should
> > > drop them when they make the storage more efficient and reduce the need
> > for
> > > type castings in user code.
> > > For Date, you say that it's just a long. Sure, but how does the
> receiver
> > > know that the long should be deserialized to a Date in this case? As a
> > user
> > > I want to work with a Date object and not just with a long. Also, we
> > > nevertheless need a convention of what this long represents:
> Milliseconds
> > > since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since
> December
> > > 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is
> > > basically just this convention which is why I would keep it.
> > >
> > > > There should be a boolean (which seems to be completely missing??).
> > >
> > > Yeah, boolean and string are both just serialized without type
> > information
> > > right now. Maybe we want to change that if we ever introduce GraphSON
> 4.
> > >
> > >
> > > Jorge's suggestion to drop all extended types except for the five he
> > > listed sounds like a good idea to me. I would only add dropping of
> either
> > > Timestamp or Date from Core and probably also Class, like Robert
> > suggested.
> > >
> > > [1] https://en.wikipedia.org/wiki/Epoch_%28reference_date%29#
> > > Notable_epoch_dates_in_computing
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Robert Dale <robd...@gmail.com>
> > > Gesendet: Freitag, 25. Mai 2018 15:43
> > > An: dev@tinkerpop.apache.org
> > > Betreff: Re: [DISCUSS] Handling of problematic GraphSON types
> > >
> > > There should be a guiding principle on this to make these decisions
> > > clearer.  First would be: Gremlin should not concern itself with
> storage
> > > schemas. As an extension of that, Gremlin should not concern itself
> with
> > > storage size. Next would be: Gremlin should not be Java-specific.
> > Finally,
> > > it should be hard to add a new type, i.e. it's demonstratively
> difficult
> > to
> > > do a real world traversal without this type, how GLVs would map it,
> what
> > > functions on that type should be a part of Gremlin, and n>1 people
> > > positively affirm this direction.
> > >
> > > Thus, there should be a minimal Core on which most else can be built.
> > All
> > > extended types should be dropped. From Core, these should be dropped:
> > Class
> > > (unless this is used for something important? Too many results on
> 'Class'
> > > in the codebase. Otherwise, it's just a string), Date (is a long),
> > > Timestamp (is a long, what's the diff to Date anyway?).  There should
> be
> > > one floating point type which is 64-bit. There should be one integer
> type
> > > which is 64-bit. There should be a boolean (which seems to be
> completely
> > > missing??).
> > >
> > >
> > > Robert Dale
> > >
> > > On Fri, May 25, 2018 at 3:37 AM, Jorge Bay Gondra <
> > > jorgebaygon...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Florian for starting the discussion on this topic!
> > > >
> > > > I think its a good exercise to evaluate which types are necessary for
> > > > a GLV to support.
> > > >
> > > > I went through a similar exercise when designing the binary
> > > > serialization format. I'll go ahead and propose:
> > > > All types that are considered "Core", "Graph Structure" and "Graph
> > > Process"
> > > > in GraphSON3
> > > > <http://tinkerpop.apache.org/docs/current/dev/io/#_core_2>
> > > > plus the following from the "Extended" list:
> > > > - Short
> > > > - Byte
> > > > - ByteBuffer
> > > > - BigInteger
> > > > - BigDecimal
> > > >
> > > > The rationale is to select types that *can't be represented and
> > > > stored* using other types.
> > > > For example:
> > > > - Short can be stored using an int backing field, but it would take
> > > > twice the space.
> > > > - BigDecimal can be stored using a ByteBuffer but ordering on a
> buffer
> > > > doesn't align with decimal ordering.
> > > >
> > > > Regarding serialization and deserialization asymmetry on GLVs (for
> > > > Core and Extended types), I think we should avoid it as it could lead
> > > > to obscure error messages on the user side.
> > > >
> > > > I think we should provide a comprehensive type representation but it
> > > > doesn't have to be contain any type imaginable. The Gremlin Server
> and
> > > > the GLVs provide extension mechanisms that vendors and users can use
> > > > to support other types.
> > > >
> > > > 2018-05-24 14:31 GMT+02:00 Florian Hockmann <f...@florian-hockmann.de
> >:
> > > >
> > > > > As part of the discussion for the pull request by Daniel C. Weber
> > > > > that
> > > > adds
> > > > > support for more extended GraphSON types to Gremlin.Net [1] we
> > > > > identified several of those types to be problematic for non-Java
> > > > > languages (or at least for .NET in this case) as they don't really
> > > > > have counterparts in other languages and for some it was even
> > > > > difficult to say where they differ
> > > > from
> > > > > each other.
> > > > >
> > > > >
> > > > >
> > > > > Now the question is basically what we want to do with those
> > > > > problematic types.
> > > > >
> > > > >
> > > > >
> > > > > My suggestion would be an approach like this:
> > > > >
> > > > > 1.      Identify types that are problematic and that we therefore
> > don't
> > > > > want
> > > > > to support across all GLVs.
> > > > > 2.      Communicate to users somehow which types are problematic
> > > > (something
> > > > > like a deprecation) as we won't support them in all GLVs and maybe
> > > > > even stop supporting them at all at some point in the future.
> > > > > 3.      Support the remaining types in all GLVs.
> > > > >
> > > > >
> > > > >
> > > > > Does that sound like a good plan? Are there any good ideas for the
> > > > > deprecation of those problematic types? My first idea would be to
> > > > > put
> > > > them
> > > > > in a different section in the I/O docs [2] that explains at the
> > > > > beginning that and why they are deprecated, but maybe someone here
> > > > > has a better
> > > > idea.
> > > > >
> > > > >
> > > > >
> > > > > Another question that was brought up during the review of the
> > > > > mentioned
> > > > PR
> > > > > by Jorge was whether types should only be supported symmetrically
> or
> > > > > whether GLVs should try to support types as good as they can. If
> > > > > someone has good arguments or a strong opinion for either side then
> > > > > it would of course
> > > > also
> > > > > be good to hear them.
> > > > >
> > > > > To give a concrete example of what is meant by symmetric support:
> > > > >
> > > > > In its current form the PR deserializes both GraphSON types
> > > > > gx:Duration
> > > > and
> > > > > gx:Period to the .NET type TimeSpan and it serializes TimeSpan back
> > > > > to gx:Duration. This means that gx:Duration is supported
> > > > > symmetrically, but gx:Period is not as there exists no .NET
> > > > > serializer that create a gx:Period.
> > > > >
> > > > >
> > > > >
> > > > > [1] https://github.com/apache/tinkerpop/pull/842
> > > > >
> > > > > [2] http://tinkerpop.apache.org/docs/current/dev/io/#_extended_2
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>

Reply via email to