Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Marko Rodriguez Mon, 29 Apr 2019 13:07:53 -0700

Hey,

Check this out:


############################
Machine machine = LocalMachine.open();
TraversalSource jdbc =
                Gremlin.traversal(machine).
                        withProcessor(PipesProcessor.class).
                        withStructure(JDBCStructure.class, 
Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
  
System.out.println(jdbc.db().values("people").as("x”).
                        db().values("addresses").as("y").has("name", 
__.path("x").by("name")).
                          path("x", "y").toList());
System.out.println(“\n\n”)
System.out.println(jdbc.db().values("people").as("x”).
                        db().values("addresses").as("y").has("name", 
__.path("x").by("name")).
                          path("x", "y").explain().toList());
############################

[[{NAME=marko, AGE=29}, {CITY=santa fe, NAME=marko}], [{NAME=josh, AGE=32}, 
{CITY=san jose, NAME=josh}]]


[Original                               [db, values(people)@x, db, 
values(addresses)@y, hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCStrategy                            [db(<database#conn9: 
url=jdbc:h2:/tmp/test user=>), values(people)@x, db(<database#conn10: 
url=jdbc:h2:/tmp/test user=>), values(addresses)@y, 
hasKeyValue(name,[path(x,[value(name)])]), path(x,y,|)]
JDBCQueryStrategy                       [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
PipesStrategy                           [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientStrategy                     [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
CoefficientVerificationStrategy         [jdbc:sql(conn9: url=jdbc:h2:/tmp/test 
user=,x,y,SELECT x.*, y.* FROM people AS x, addresses AS y WHERE x.name=y.name)]
-------------------------------
Compilation                             [FlatMapInitial]
Execution Plan [PipesProcessor]         [InitialStep[FlatMapInitial]]]





I basically look for a db.values.db.values.has-pattern in the bytecode and if I 
find it, I try and roll it into a single provider-specific instruction that 
does a SELECT query.

Here is JDBCQueryStrategy (its ghetto and error prone, but I just wanted to get 
the basic concept working):
        
https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java
 
<https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/strategy/JDBCQueryStrategy.java>
Here is SqlFlatMapStep (hyper-ghetto… but whateva’):
        
https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java
 
<https://github.com/apache/tinkerpop/blob/7142dc16d8fc81ad8bd4090096b42e5b9b1744f4/java/machine/structure/jdbc/src/main/java/org/apache/tinkerpop/machine/structure/jdbc/function/flatmap/SqlFlatMap.java>

Na na!,
Marko.

http://rredux.com <http://rredux.com/>




> On Apr 29, 2019, at 11:50 AM, Marko Rodriguez <okramma...@gmail.com> wrote:
> 
> Hello Kuppitz,
> 
>> I don't think it's a good idea to keep this mindset for TP4; NULLs are too
>> important in RDBMS. I don't know, maybe you can convince SQL people that
>> dropping a value is the same as setting its value to NULL. It would work
>> for you and me and everybody else who's familiar with Gremlin, but SQL
>> people really love their NULLs….
> 
> Hmm……. I don’t like nulls. Perhaps with time a clever solution will emerge. 
> ????
> 
>> I'd prefer to just have special accessors for these. E.g. g.V().meta("id").
>> At least valueMaps would then only have String-keys.
>> I see the issue with that (naming collisions), but it's still better than
>> the enums in my opinion (which became a pain when started to implement
>> GLVs).
> 
> So, TSymbols are not Java enums. They are simply a “primitive”-type that will 
> have a serialization like:
> 
>       symbol[id]
> 
> Meaning, that people can make up Symbols all day long without having to 
> update serializers. How I see them working is that they are Strings prefixed 
> with #.
> 
> g.V().outE()             <=>   g.V().values(“#outE”)
> g.V().id()               <=>   g.V().value(“#id”)
> g.V().hasLabel(“person") <=>   g.V().has(“#label”,”person”)
> 
> Now that I type this out, perhaps we don’t even have a TSymbol-class. 
> Instead, any String that starts with # is considered a symbol. Now watch this:
> 
> g.V().label()  <=>   g.V().value(“#label”)
> g.V().labels() <=>   g.V().values(“#label”)
> 
> In this way, we can support Neo4j multi-labels as a Neo4jVertex’s #label-Key 
> references a TSequence<String>.
> 
> g.V(1).label() => TSequence<String>
> g.V(1).labels() => String, String, String, …
> g.V(1).label().add(“programmer”)
> g.V(1).label().drop(“person”)
> 
> So we could do “meta()”, but then you need respective “hasXXX”-meta() 
> methods. I think #symbol is easiest .. ?
> 
>> Also, what I'm wondering about now: Have you thought about Stored
>> Procedures and Views in RDBMS? Views can be treated as tables, easy, but
>> what about stored procedures? SPs can be found in many more DBMS, would be
>> bad to not support them (or hack something ugly together later in the
>> development process).
> 
> I’m not super versed in RDBMS technology. Can you please explain to me how to 
> create a StoreProcedure and the range of outputs a StoredProcedure produces? 
> From there, I can try and “Bytecode-ize” it.
> 
> Thanks Kuppitz,
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
> 
> 
>> On Mon, Apr 29, 2019 at 7:34 AM Marko Rodriguez <okramma...@gmail.com 
>> <mailto:okramma...@gmail.com>>
>> wrote:
>> 
>>> Hi,
>>> 
>>> *** This email is primarily for Josh (and Kuppitz). However, if others are
>>> interested… ***
>>> 
>>> So I did a lot of thinking this weekend about structure/ and this morning,
>>> I prototyped both graph/ and rdbms/.
>>> 
>>> This is the way I’m currently thinking of things:
>>> 
>>>        1. There are 4 base types in structure/.
>>>                - Primitive: string, long, float, int, … (will constrain
>>> these at some point).
>>>                - TTuple<K,V>: key/value map.
>>>                - TSequence<V>: an iterable of v objects.
>>>                - TSymbol: like Ruby, I think we need “enum-like” symbols
>>> (e.g., #id, #label).
>>> 
>>>        2. Every structure has a “root.”
>>>                - for graph its TGraph implements TSequence<TVertex>
>>>                - for rdbms its a TDatabase implements
>>> TTuple<String,TTable>
>>> 
>>>        3. Roots implement Structure and thus, are what is generated by
>>> StructureFactory.mint().
>>>                - defined using withStructure().
>>>                - For graph, its accessible via V().
>>>                - For rdbms, its accessible via db().
>>> 
>>>        4. There is a list of core instructions for dealing with these
>>> base objects.
>>>                - value(K key): gets the TTuple value for the provided key.
>>>                - values(K key): gets an iterator of the value for the
>>> provided key.
>>>                - entries(): gets an iterator of T2Tuple objects for the
>>> incoming TTuple.
>>>                - hasXXX(A,B): various has()-based filters for looking
>>> into a TTuple and a TSequence
>>>                - db()/V()/etc.: jump to the “root” of the withStructure()
>>> structure.
>>>                - drop()/add(): behave as one would expect and thus.
>>> 
>>> ————
>>> 
>>> For RDBMS, we have three interfaces in rdbms/.
>>> (machine/machine-core/structure/rdbms)
>>> 
>>>        1. TDatabase implements TTuple<String,TTable> // the root
>>> structure that indexes the tables.
>>>        2. TTable implements TSequence<TRow<?>> // a table is a sequence
>>> of rows
>>>        3. TRow<V> implements TTuple<String,V>> // a row has string column
>>> names
>>> 
>>> I then created a new project at machine/structure/jdbc). The classes in
>>> here implement the above rdbms/ interfaces/
>>> 
>>> Here is an RDBMS session:
>>> 
>>> final Machine machine = LocalMachine.open();
>>> final TraversalSource jdbc =
>>>        Gremlin.traversal(machine).
>>>                        withProcessor(PipesProcessor.class).
>>>                        withStructure(JDBCStructure.class,
>>> Map.of(JDBCStructure.JDBC_CONNECTION, "jdbc:h2:/tmp/test"));
>>> 
>>> System.out.println(jdbc.db().toList());
>>> System.out.println(jdbc.db().entries().toList());
>>> System.out.println(jdbc.db().value("people").toList());
>>> System.out.println(jdbc.db().values("people").toList());
>>> System.out.println(jdbc.db().values("people").value("name").toList());
>>> System.out.println(jdbc.db().values("people").entries().toList());
>>> 
>>> This yields:
>>> 
>>> [<database#conn1: url=jdbc:h2:/tmp/test user=>]
>>> [PEOPLE:<table#PEOPLE>]
>>> [<table#people>]
>>> [<row#PEOPLE:1>, <row#PEOPLE:2>]
>>> [marko, josh]
>>> [NAME:marko, AGE:29, NAME:josh, AGE:32]
>>> 
>>> The bytecode of the last query is:
>>> 
>>> [db(<database#conn1: url=jdbc:h2:/tmp/test user=>), values(people),
>>> entries]
>>> 
>>> JDBCDatabase implements TDatabase, Structure.
>>>        *** JDBCDatabase is the root structure and is referenced by db()
>>> *** (CRUCIAL POINT)
>>> 
>>> Assume another table called ADDRESSES with two columns: name and city.
>>> 
>>> 
>>> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).value(“city”)
>>> 
>>> The above is equivalent to:
>>> 
>>> SELECT city FROM people,addresses WHERE people.name=addresses.name
>>> 
>>> If you want to do an inner join (a product), you do this:
>>> 
>>> 
>>> jdbc.db().values(“people”).as(“x”).db().values(“addresses”).has(“name”,eq(path(“x”).by(“name”))).as(“y”).path(“x”,”y")
>>> 
>>> The above is equivalent to:
>>> 
>>> SELECT * FROM addresses INNER JOIN people ON people.name=addresses.name
>>> 
>>> NOTES:
>>>        1. Instead of select(), we simply jump to the root via db() (or
>>> V() for graph).
>>>        2. Instead of project(), we simply use value() or values().
>>>        3. Instead of select() being overloaded with by() join syntax, we
>>> use has() and path().
>>>                - like TP3 we will be smart about dropping path() data
>>> once its no longer referenced.
>>>        4. We can also do LEFT and RIGHT JOINs (haven’t thought through
>>> FULL OUTER JOIN yet).
>>>                - however, we don’t support ‘null' in TP so I don’t know
>>> if we want to support these null-producing joins. ?
>>> 
>>> LEFT JOIN:
>>>        * If an address doesn’t exist for the person, emit a “null”-filled
>>> path.
>>> 
>>> jdbc.db().values(“people”).as(“x”).
>>>  db().values(“addresses”).as(“y”).
>>>    choose(has(“name”,eq(path(“x”).by(“name”))),
>>>      identity(),
>>>      path(“y”).by(null).as(“y”)).
>>>  path(“x”,”y")
>>> 
>>> SELECT * FROM addresses LEFT JOIN people ON people.name=addresses.name
>>> 
>>> RIGHT JOIN:
>>> 
>>> jdbc.db().values(“people”).as(“x”).
>>>  db().values(“addresses”).as(“y”).
>>>    choose(has(“name”,eq(path(“x”).by(“name”))),
>>>      identity(),
>>>      path(“x”).by(null).as(“x”)).
>>>  path(“x”,”y")
>>> 
>>> 
>>> SUMMARY:
>>> 
>>> There are no “low level” instructions. Everything is based on the standard
>>> instructions that we know and love. Finally, if not apparent, the above
>>> bytecode chunks would ultimately get strategized into a single SQL query
>>> (breadth-first) instead of one-off queries (depth-first) to improve
>>> performance.
>>> 
>>> Neat?,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>> <http://rredux.com/>>
>

Re: The Fundamental Structure Instructions Already Exist! (w/ RDBMS Example)

Reply via email to