Florian wrote: > I don't really see why we should drop them when they make the storage more efficient and reduce the need for type castings in user code.
Where is it more efficient? I think that statement a specialization that's not generally applicable. a. At the client? This is language dependent. Many high-level languages actually have only 2 number types: integer, float. My guess is that they occupy the same space for the most part. In any case, they would ignore any size hints. b. At GraphSON? The text size is dependent on actuall length of characters and not the memory storage size. e.g. a '0' byte is the same size as a '0' long. c. In the TinkerPop Java implementation? In Java, all of the numbers are stored as Objects. Byte, Short, Integer all occupy 16 bytes. Long is 24 bytes. There's really not a substantial difference. d. In the database? Yes, but I started out with Gremlin should be schema-agnostic. And all modern databases I've worked with will handle the conversion to storage size for you. Are there any that don't? Should we be concerned about it? I've pulled some select phrases from Python's rationale to unify integers: "Having the machine word size exposed to the language hinders portability." "There is also the general desire to hide unnecessary details from the [...] user when they are irrelevant for most applications. An example is memory allocation, [...] giving us the convenience of unlimited sizes on strings, lists, etc. It makes sense to extend this convenience to numbers." - https://www.python.org/dev/peps/pep-0237/ So, take the inverse problem of the client where in python 3, for example, there is only one integer type. In the Gremlin client, it has undone what python 3 did. That is, the Gremlin python client adds cast functions for int, long, float. And so it goes for every client out there where you want to expose memory size. This goes back to client/server symmetry. And why does GraphSON arbitrarily stop at Int32,64? Why are 1, 8, 16 extended types? Let's just do away with memory size. Besides, all the cool kids are doing it. What more do you need than that? Florian wrote: > For Date, you say that it's just a long. Sure, but how does the receiver know that the long should be deserialized to a Date in this case? As a user I want to work with a Date object and not just with a long. Also, we nevertheless need a convention of what this long represents: Milliseconds since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since December 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is basically just this convention which is why I would keep it. I think you're conflating types with intent. How does the receiver know what anything is really? That's an application detail. Aside from that, your argument falls apart exactly for the reasons given. g:Date is not well-defined. You are implicitly using Java's definition of Date. If we take your example , what if you were also interacting with a Gremlin Server written C++ 7.0? Now all your g:Dates are off by 71 years. >From a Gremlin perspective, currently there is no first-class support for Date , it's just a pass-through type. In order to do anything with it, you would have to use an other-language lambda. I'm not against temporal types, just IMO, this is not sufficiently supported in Gremlin to be a supported type. So if the long-term goal for Gremlin is to get away from lambdas and be language agnostic, then it needs the date operators and functions in order to support a date type. Again, provide the use cases. Demonstrate why it's required. Provide the operators/functions. Stephen wrote: > I just wonder exactly how graphs that don't ' have schemas like neo4j/tinkergraph will deal with someone sending a "Number". What happens in that case? Neo4j actually supports only two number types, integer and float, just as proposed here. https://neo4j.com/docs/developer-manual/current/drivers/cypher-values/ Robert Dale On Tue, May 29, 2018 at 12:45 PM, Stephen Mallette <spmalle...@gmail.com> wrote: > > Regarding serialization and deserialization asymmetry on GLVs (for Core > and Extended types), I think we should avoid it as it could lead to > obscure error > messages on the user side. > > In the past, I think TinkerPop (going back to 2.x) has been ok with it and > I'm not so sure that I recall any specific problems that were every voiced > by users on the subject. As it stands, I think we already have some > asymmetry in gremlin-python so there is some precedent for it. There are > also some cases where it logically makes some sense to safely have one but > not the other. A GLV likely doesn't need to support a Bytecode deserializer > because it doesn't receive bytecode from the server. It only needs to send > bytecode and thus only has a serializer - at least until we have GVMs > instead of GLVs :) Does that change your thinking at all Jorge? > > > First would be: Gremlin should not concern itself with storage > schemas..... > > I like all of Robert's first paragraph because it makes Jorge's binary > format proposal that much easier to get right. JanusGraph, DSE Graph and > others won't have any trouble with this approach because the backend will > simply know that the particular property that this number is going into > will be a float and will coerce it as such on storage. I just wonder > exactly how graphs that don't ' have schemas like neo4j/tinkergraph will > deal with someone sending a "Number". What happens in that case? > > On Mon, May 28, 2018 at 4:20 AM, Florian Hockmann <f...@florian-hockmann.de> > wrote: > > > > these should be dropped: Class (unless this is used for something > > important? Too many results on 'Class' > > in the codebase. > > > > 'Class' is for example used for 'withoutStrategies' but I agree that this > > would probably better handled just as a string. 'Class' is Java-specific > > which doesn't make much sense when graph providers want to implement > > TinkerPop in another language than Java. > > > > Apart from that, I'm not sure I get your reasoning behind dropping types > > like Date, Int32, and float. It's really trivial in most languages to add > > serializers for more numerical types so I don't really see why we should > > drop them when they make the storage more efficient and reduce the need > for > > type castings in user code. > > For Date, you say that it's just a long. Sure, but how does the receiver > > know that the long should be deserialized to a Date in this case? As a > user > > I want to work with a Date object and not just with a long. Also, we > > nevertheless need a convention of what this long represents: Milliseconds > > since January 1, 1970 (POSIX)? Since January 1, 1 (.NET)? Since December > > 31, 1899 (C++ 7.0)? (There are a lot more epoch dates [1].) g:Date is > > basically just this convention which is why I would keep it. > > > > > There should be a boolean (which seems to be completely missing??). > > > > Yeah, boolean and string are both just serialized without type > information > > right now. Maybe we want to change that if we ever introduce GraphSON 4. > > > > > > Jorge's suggestion to drop all extended types except for the five he > > listed sounds like a good idea to me. I would only add dropping of either > > Timestamp or Date from Core and probably also Class, like Robert > suggested. > > > > [1] https://en.wikipedia.org/wiki/Epoch_%28reference_date%29# > > Notable_epoch_dates_in_computing > > > > -----Ursprüngliche Nachricht----- > > Von: Robert Dale <robd...@gmail.com> > > Gesendet: Freitag, 25. Mai 2018 15:43 > > An: dev@tinkerpop.apache.org > > Betreff: Re: [DISCUSS] Handling of problematic GraphSON types > > > > There should be a guiding principle on this to make these decisions > > clearer. First would be: Gremlin should not concern itself with storage > > schemas. As an extension of that, Gremlin should not concern itself with > > storage size. Next would be: Gremlin should not be Java-specific. > Finally, > > it should be hard to add a new type, i.e. it's demonstratively difficult > to > > do a real world traversal without this type, how GLVs would map it, what > > functions on that type should be a part of Gremlin, and n>1 people > > positively affirm this direction. > > > > Thus, there should be a minimal Core on which most else can be built. > All > > extended types should be dropped. From Core, these should be dropped: > Class > > (unless this is used for something important? Too many results on 'Class' > > in the codebase. Otherwise, it's just a string), Date (is a long), > > Timestamp (is a long, what's the diff to Date anyway?). There should be > > one floating point type which is 64-bit. There should be one integer type > > which is 64-bit. There should be a boolean (which seems to be completely > > missing??). > > > > > > Robert Dale > > > > On Fri, May 25, 2018 at 3:37 AM, Jorge Bay Gondra < > > jorgebaygon...@gmail.com> > > wrote: > > > > > Thanks Florian for starting the discussion on this topic! > > > > > > I think its a good exercise to evaluate which types are necessary for > > > a GLV to support. > > > > > > I went through a similar exercise when designing the binary > > > serialization format. I'll go ahead and propose: > > > All types that are considered "Core", "Graph Structure" and "Graph > > Process" > > > in GraphSON3 > > > <http://tinkerpop.apache.org/docs/current/dev/io/#_core_2> > > > plus the following from the "Extended" list: > > > - Short > > > - Byte > > > - ByteBuffer > > > - BigInteger > > > - BigDecimal > > > > > > The rationale is to select types that *can't be represented and > > > stored* using other types. > > > For example: > > > - Short can be stored using an int backing field, but it would take > > > twice the space. > > > - BigDecimal can be stored using a ByteBuffer but ordering on a buffer > > > doesn't align with decimal ordering. > > > > > > Regarding serialization and deserialization asymmetry on GLVs (for > > > Core and Extended types), I think we should avoid it as it could lead > > > to obscure error messages on the user side. > > > > > > I think we should provide a comprehensive type representation but it > > > doesn't have to be contain any type imaginable. The Gremlin Server and > > > the GLVs provide extension mechanisms that vendors and users can use > > > to support other types. > > > > > > 2018-05-24 14:31 GMT+02:00 Florian Hockmann <f...@florian-hockmann.de>: > > > > > > > As part of the discussion for the pull request by Daniel C. Weber > > > > that > > > adds > > > > support for more extended GraphSON types to Gremlin.Net [1] we > > > > identified several of those types to be problematic for non-Java > > > > languages (or at least for .NET in this case) as they don't really > > > > have counterparts in other languages and for some it was even > > > > difficult to say where they differ > > > from > > > > each other. > > > > > > > > > > > > > > > > Now the question is basically what we want to do with those > > > > problematic types. > > > > > > > > > > > > > > > > My suggestion would be an approach like this: > > > > > > > > 1. Identify types that are problematic and that we therefore > don't > > > > want > > > > to support across all GLVs. > > > > 2. Communicate to users somehow which types are problematic > > > (something > > > > like a deprecation) as we won't support them in all GLVs and maybe > > > > even stop supporting them at all at some point in the future. > > > > 3. Support the remaining types in all GLVs. > > > > > > > > > > > > > > > > Does that sound like a good plan? Are there any good ideas for the > > > > deprecation of those problematic types? My first idea would be to > > > > put > > > them > > > > in a different section in the I/O docs [2] that explains at the > > > > beginning that and why they are deprecated, but maybe someone here > > > > has a better > > > idea. > > > > > > > > > > > > > > > > Another question that was brought up during the review of the > > > > mentioned > > > PR > > > > by Jorge was whether types should only be supported symmetrically or > > > > whether GLVs should try to support types as good as they can. If > > > > someone has good arguments or a strong opinion for either side then > > > > it would of course > > > also > > > > be good to hear them. > > > > > > > > To give a concrete example of what is meant by symmetric support: > > > > > > > > In its current form the PR deserializes both GraphSON types > > > > gx:Duration > > > and > > > > gx:Period to the .NET type TimeSpan and it serializes TimeSpan back > > > > to gx:Duration. This means that gx:Duration is supported > > > > symmetrically, but gx:Period is not as there exists no .NET > > > > serializer that create a gx:Period. > > > > > > > > > > > > > > > > [1] https://github.com/apache/tinkerpop/pull/842 > > > > > > > > [2] http://tinkerpop.apache.org/docs/current/dev/io/#_extended_2 > > > > > > > > > > > > > > > > > > > > > > > >