Re: [DISCUSS] string formatting

Stephen Mallette Tue, 28 Jan 2020 05:42:54 -0800

I think I like the named variable approach in format() - something Kuppitz
seemed to like as well. Sorta works a bit like math() in this way:

gremlin> g.V().hasLabel('person').format("%{n} is %{a} years old.").by('n',
'name').by('a', 'age')
==>marko is 29 years old.
==>vadas is 27 years old.
==>josh is 32 years old.
==>peter is 35 years old.
gremlin> g.V().hasLabel('person').format("%{name} is %{age} years old.")
==>marko is 29 years old.
==>vadas is 27 years old.
==>josh is 32 years old.
==>peter is 35 years old.

Note that the above meant some new by() overloads. Ultimately, math()
should probably be modified to use this syntax as it is easier to follow.
Kuppitz did point out to me on the side that the new by() overloads also
have relevance with where():

...where(eq('x')).by(...).by('x', ...)

where it's hard to remember which of the 2 by() steps is applied to the
incoming object. It might also make for a more readily understandable
syntax for project() where the following would be equivalent:

project('x').by('name)
project().by('x','name)

that second syntax has been proposed to me many times by users over the
years who prefer that over trying to map the order of by() to keys given to
the project() as arguments. also, it's surprising how many people end up
not following the notion of round-robin application of the by(). It's
convenient in a number of cases, but least useful with project() (not sure
what use case you would satisfy by doing that other than just defaulting
all the keys in the Map to the same value??).

I used the StrSubstituter class included with Apache Commons so it didn't
require a new dependency of extensive code to implement. Though we may need
to introduce commons-text as the current class is deprecated in
commons-lang3 (moved to commons-text). I thought about other templating
engines, but I suppose that just places more pressure on providers to
support that complexity. Maybe this simple formatting is good enough from
TinkerPop. I also think that we probably need to tackle other string
manipulation functions more as first class citizens rather than secondary
functions of templating.

Still thinking about formatting to JSON (and other forms) given Josh's
comments. I guess we'd need to figure out what we would do with a JSON
engine without having a schema to rely on for now.

On Fri, Jan 24, 2020 at 12:50 AM Joshua Shinavier <[email protected]> wrote:

> Just a quick note, but I don't think we would go far wrong using Formatter
> for now. It is after all an "interpreter for printf-style format strings",
> where printf <https://en.wikipedia.org/wiki/Printf_format_string> has a
> POSIX
> specification <https://pubs.opengroup.org/onlinepubs/9699919799/> that is
> implemented in many programming languages. The Formatter docs go on to say
> that while Formatter is "inspired by" printf, it departs from printf in
> ways that are idiosyncratic to Java. My googling did not turn up a handy
> list of features for which Formatter differs from printf, nor did it turn
> up a POSIXly-correct printf library for Java. Likely that both of these
> things exist somewhere. Otherwise if we really wanted to be picky about
> portability, we might have to write that custom printf in each
> target language, including Java.
>
> W.r.t. outputting elements as JSON, IMO this is another area where a formal
> data model is going to help, and the output need not be limited to JSON.
> Thrift, Protobuf, Avro, JSON, GraphQL... any data model we have an
> appropriate API for and can describe in terms of primitive types, sum, and
> products, we can map schemas and data into. The target of format() would
> not be "JSON" but (for example) "JSON conforming to the JSON Schema
> equivalent of your graph schema". In the default graph schema, I think
> there will be one kind of vertex, and one kind of edge, with labels more
> like properties than types. The generated JSON for a graph with this flat
> schema could be made to look something like GraphSON, though I wouldn't
> expect the default representation of a vertex to contain "inE" or "outE"
> because a vertex doesn't own/contain its incident edges. You could output
> something that *contains* a vertex and also contains the incident edges.
>
> Josh
>
>
>
>
>
>
> On Wed, Jan 22, 2020 at 12:11 PM Stephen Mallette <[email protected]>
> wrote:
>
> > We've long had the issue of how to deal with string better. Typically the
> > concern lies with concatenation, but there are other use cases that have
> > come up along the way as well. I started playing around with a format()
> > step to try to capture all the odds and ends I have notes on in relation
> to
> > this:
> >
> > Quickly hacked together I have something that allows:
> >
> > gremlin> g.V().hasLabel('person').format("%s is %s years
> > old").by('name').by('age')
> > ==>marko is 29 years old
> > ==>vadas is 27 years old
> > ==>josh is 32 years old
> > ==>peter is 35 years old
> >
> > The engine behind the string formatting is the standard Java Formatter. I
> > just wanted to see what it could look like so Formatter was an easy
> choice.
> > Of course, Formatter might not be best - part of me would prefer a more
> > non-JVM centric sort of templating, perhaps something like:
> >
> > g.V().hasLabel('person').format("{} is {} years
> old").by('name').by('age')
> >
> > which is fairly commonplace across languages (even used in Java in
> > libraries like slf4j). That of course made me realize that it wouldn't be
> > hard to overload format() to take a formatting engine as an argument so
> > that it's extensible:
> >
> > g.V().hasLabel('person').format("{} is {} years
> old").by('name').by('age')
> >
> > g.V().hasLabel('person').format(JAVA, "%s is %s years
> > old").by('name').by('age')
> >
> > The notion of a formatting engine argument made me think about another
> > thing folks tend to want in relation to strings - clean output to JSON
> (not
> > GraphSON exactly with all the embedded types - like think back to
> GraphSON
> > 1 format) and other string formats:
> >
> > g.V().hasLabel('person').format(JSON)
> >
> > or perhaps it is just GraphSON??
> >
> > g.V().hasLabel('person').format(GraphSON_1)
> >
> > Providers who require special serializers could easily just override the
> > FormatStep to configure the engines as necessary.
> >
> > I think format() helps solve a lot of the common issues with strings and
> > Gremlin. Even with the basic Formatter you can do a poor man's sort of
> > substring:
> >
> > gremlin> g.V().hasLabel('person').format("%1.1s").by('name')
> > ==>m
> > ==>v
> > ==>j
> > ==>p
> >
> > I'd imagine that with a more advanced engine we could get something more
> > full featured if we wanted to cover even wider general function use
> cases.
> > Not sure if things like substring should be more like first class
> citizens
> > in Gremlin or not though. Anyway, happy to hear any thoughts on the idea
> of
> > format() and what it might mean to Gremlin.
> >
>

Re: [DISCUSS] string formatting

Reply via email to