Re: The Bytecode Pattern-Matching Model

Marko Rodriguez Fri, 17 May 2019 11:04:20 -0700

Hi,

Kuppitz makes fun of me for my constant use of the word “tuple” for anything 
that has to do with TP4 structure/.


Perhaps this is the API:
        https://gist.github.com/okram/84912722a2c00f26f07f1c4825eacd50 
<https://gist.github.com/okram/84912722a2c00f26f07f1c4825eacd50>
My response below to Stephen is still worth reading as its more detailed and I 
assume you understand it for the link above.

What I like about the updated API:

        Are you only talking RDBMS? 
                TMap. “relations"
        Are you only talking GraphDB? 
                TMap, TPrimitive. “vertices” and “edges” and their property 
values.
                        @Josh: want to build a type system over graphdb —> 
“vertex+edge” = “relations”.
        Are you only talking DocumentDB? 
                TMap, TList, TPrimitive. “objects” containing “objects”, 
“lists”, “primitives"
        Are you only talking Wide-Column? 
                TMap. “relations"
        …

I’ll stop for now. I don’t want to overload y’all. And its the freakin’ 
weekend… oh wait, everyday is the weekend for me.

Peace in the Far East (LA),
Marko.

http://rredux.com <http://rredux.com/>




> On May 17, 2019, at 7:58 AM, Marko Rodriguez <[email protected]> wrote:
> 
> Hi,
> 
> Thanks for your question. 
> 
> I suppose that a “limit bandwidth”-optimization could be based on the 
> provider looking at all the instructions in the submitted instruction and 
> then use that information to constrain what bytecode patterns it exposes. A 
> simple ProviderStrategy would be the means of doing that.
> 
> Perhaps showing you what I think the Tuple API should look like would help. 
> This API would represent the primary way in which the TP VM interacts with 
> the structure/ provider. Thus, this is for all cookies in the cookie jar!
> 
> ############################################################
> 
> public interface Tuple<A> extends Iterator<Tuple<A>> {
> 
>   public boolean hasKey(Object key);
>   public boolean hasValue(Object value);
>   public <B> Tuple<B> get(Object key);
>   public A value();
>   public long count();
>   public boolean hasNext();
>   public Tuple<A> next();
> 
>   public boolean match(Instruction instruction);
>   public Tuple apply(Instruction instruction);
>   
> }
> 
> ############################################################
> 
> Structure neo4j = Neo4jStructureFactory.open(config1)
> Tuple<Map<String,String> db = neo4j.root(); 
>   => { type:graph | [V] }#1
> 
> //////
> 
> Let a = 
> 
> { type:vertex, name:marko, age:29 | [inE] [outE] }#1
> 
> a.count()                       => 1
> a.value()                       => 
> Map.of('type','vertex','name','marko','age',29)
> a.get('type')                   => { 'vertex' }#1
> a.get('name')                   => { 'marko' }#1
> a.hasKey('blah')                => false
> a.match(Instruction.of('outE')) => true
> 
> //////
> 
> b = a.apply(Instruction.of('outE’))
> 
> { type:edge, label:?string | [outV] [inV] }#?
> 
> b.count()                      => -1
> b.hasKey('weight')             => null            // not false because all we 
> know is type:edge & label:?string about #? of things.
> b.hasKey('type')               => true
> b.hasKey('label')              => true
> b.get('label')                 => { ?string }#?   // ?string is something 
> like Unknown.of(Type.string())
> 
> //////
> 
> c = b.apply(Instruction.of('inV'))
> 
> { type:vertex }#?
> 
> c.count()      => -1
> c.value()      => Map.of('type','vertex')
> c.hasNext()    => true
> c.next()       => { type:vertex, name:stephen, age:17 | [inE] [outE] }
> c.hasNext()    => true
> c.next()       => { type:vertex, name:kuppitz | [inE] [outE] }
> c.hasNext()    => false
> c.count()      => 0
> 
> //////
> 
> d = { type:vertex, name:kuppitz | [inE] [outE] }
> 
> e = d.get('name')
> 
> { kuppitz }#1
> 
> e.count()     => 1
> e.value()     => 'kuppitz'
> 
> //////
> 
> Let f = 
> 
> { type:edge | [outV] [inV] [has,label,eq,?0] }?10
> 
> f.count()                                            => 10
> f.get('type')                                        => { 'edge' }#10
> f.match(Instruction.of('has','label',P.eq,'knows'))  => true
> 
> //////
> 
> g = f.apply(Instruction.of('has','label',P.eq,'knows'))
> 
> { type:edge, label:knows | [outV] [inV] }#1
> 
> g.count()      => 1
> g.hasNext()    => true
> g.next()       => { type:edge, label:knows | [outV] [inV] }#1  // its 
> iteration is itself!
> g.hasNext()    => false                                        // g lost the 
> reference
> g.count()      => 0
> 
> //////
> 
> Cool? Questions?
> 
> Thanks,
> Marko.
> 
> http://rredux.com <http://rredux.com/>
> 
> 
> 
> 
>> On May 17, 2019, at 6:57 AM, Stephen Mallette <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> This is a nicely refined representation of this concept. I think I've
>> followed this abstractly since you first started discussing it, but I've
>> struggled with the implementation of it and how it would best work (which
>> is probably the reason I keep thinking that I"m not following the
>> abstraction hehe). You nicely wrote this from the perspective of the
>> individual providers which I think connected me more to the more concrete
>> aspect of things, which leads me to this question:  Does the provider send
>> the instructions by looking at the query or do they just provide all the
>> possible instructions and TP figures it out? (i feel like i've kinda read
>> it both ways at different times).
>> 
>> On Fri, May 17, 2019 at 8:12 AM Marko Rodriguez <[email protected] 
>> <mailto:[email protected]>>
>> wrote:
>> 
>>> Hello,
>>> 
>>> This email is primarily for Kuppitz and Josh. Kuppitz offered me his
>>> attention yesterday. I explained to him an idea I’ve been working on this
>>> week. I’ve been frustrated lately because emails and IM are so hard to
>>> express abstract ideas. Fortunately, Kuppitz was patient with me. Then he
>>> got it. Then he innovated on it. I was elated.
>>> 
>>>        https://twitter.com/twarko/status/1129117666910674944 
>>> <https://twitter.com/twarko/status/1129117666910674944> <
>>> https://twitter.com/twarko/status/1129117666910674944 
>>> <https://twitter.com/twarko/status/1129117666910674944>>
>>> 
>>> Josh was interested in what this was all about. I had to go to leave for
>>> hockey, but I gave him a fast break down. He sorta got the vibe, but wanted
>>> to know more…..
>>> 
>>> ########################################
>>> 
>>> There is only one type of “tuple.”
>>> 
>>> { }#?
>>> 
>>> The notation says: there are objects, but I don’t know how many of them
>>> there are…..if you want to know more, iterate.
>>> 
>>> ########################################
>>> 
>>> Let us begin…………..
>>> 
>>> 
>>> ——————TP4 WITH PROVIDER A——————
>>> 
>>> g.
>>> 
>>> { [V] }#1
>>> 
>>> There is one object. Thus, what you see is all that I know about this
>>> object. In particular, what I know is that it can be mapped via the
>>> bytecode instruction [V].
>>> 
>>> Let us apply [V].
>>> 
>>> { name:?string | [has,age,?0,?1] [has,id,eq,?0] }#?
>>> 
>>> There are some number of objects. If you want to know what they are,
>>> iterate. However, I am aware of a feature that they all share. I do know
>>> for a fact (by the way I was designed by my creator ProviderA) that every
>>> one of the objects has a name-key to some string value. Also, two has()
>>> bytecode patterns are available.
>>> 
>>> Let us apply [hasKey,name].
>>> 
>>> { name:?string | [has,age,?0,?1] [has,id,eq,?0] }#?
>>> 
>>> The instruction didn't match any of the available bytecode patterns. Thus,
>>> the instruction has to evaluated. Did you need to iterate and filter out
>>> those that don’t have a name-key? No. As I told you, I know that every one
>>> of the objects has a name-key.
>>> 
>>> Let us apply [has,id,eq,1].
>>> 
>>> { name:marko, age:29 | [inE] [outE] }#1
>>> 
>>> There is one thing. It has primitive key/value data —  a name and an age.
>>> 
>>> Let us apply [values,name].
>>> 
>>> { marko }#1
>>> 
>>> That bytecode instruction didn't match any the available bytecode
>>> patterns. The instruction was evaluated and there is one thing: the string
>>> “marko.”
>>> 
>>> We did:
>>> 
>>> g.V().hasKey(‘name’).hasId(1).values(‘name’)
>>> 
>>> The query you provided used an index on id. How do we know that? You
>>> didn’t have to iterate all the objects and filter on id. I was able to jump
>>> from all vertices to the one with id=1.
>>> 
>>> ——————TP4 WITH PROVIDER B——————
>>> 
>>> { type:person, name:?string, age:?int | [has,name,eq,?0] }?10
>>> 
>>> There are 10 objects. Some providers can’t determine how many objects
>>> there are without full iteration. But, by the way I was designed, I know. I
>>> also know that all the object have a type:person key/value. I also know
>>> they all have a name-key and int-key with known value types.
>>> 
>>> What am I?
>>> 
>>> CREATE TABLE people {
>>>  name varchar(100),
>>>  age int
>>> }
>>> CREATE INDEX people_name_idx ON people (name);
>>> 
>>> ——————TP4 WITH PROVIDER C——————
>>> 
>>> g.V().has(‘name’,’marko’).has(‘age’,gt(20)).id()
>>> 
>>> This is easy. My creator, ProviderC, provides multi-key indices. And when
>>> the database instance was created, a (name,age)-index was created. Also,
>>> because you only want the id of those vertices named marko whose age is
>>> greater than 20, I don’t have to manifest the vertices, I can simply get
>>> the id out of the index. This is what I provided for each instruction of
>>> your query...
>>> 
>>> 1. { type:graph | [V] }#1
>>> 2. { type:vertex | [has,name,eq,?0] [has,age,?0,?1] [id] }#?
>>> 3. { type:vertex, label:person, name:marko | [has,age,?0,?1] [id] }#?
>>> 4. { type:vertex, label:person, name:marko, age:gt(20) | [id] }#?
>>> 5. { type:int }#?
>>> 
>>> Unlike ProviderA, all the objects in me have a type-key. It is just
>>> something I like to do. Call it my quirk. Thus, on line #2, I know that
>>> there are some number of vertex objects. And do you see my multi-property
>>> index there? On line #3, I know for a fact that every one of those objects
>>> has a name:marko entry. Finally, by line #5, I don’t know how many
>>> id-objects there are, but I do know they are all integers. If you want to
>>> know what they are, iterate.
>>> 
>>> Below are the possible "bytecode pattern”-paths that are available off of
>>> the graph object. At any point through this pattern, you could iterate.
>>> 
>>>                        [V]
>>>                       / | \
>>>                      / [id]\
>>>                     /       \
>>>      [has,name,eq,?0]        [has,age,?0,?1]
>>>         /         \             /          \
>>>        /           \           /            \
>>> [has,age,?0,?1]    [id]    [has,name,eq,?0]  [id]
>>>       |                          |
>>>      [id]                       [id]
>>> 
>>> 
>>> *** In case the diagram above looks weird in your mail client:
>>> https://gist.github.com/okram/f7f20a3c33aa7caca7c28e85fd16be3f 
>>> <https://gist.github.com/okram/f7f20a3c33aa7caca7c28e85fd16be3f> <
>>> https://gist.github.com/okram/f7f20a3c33aa7caca7c28e85fd16be3f 
>>> <https://gist.github.com/okram/f7f20a3c33aa7caca7c28e85fd16be3f>>
>>> 
>>> ——————TP4 WITH PROVIDER D——————
>>> 
>>> I support "vertex-centric indices.” For certain queries, I don’t have to
>>> manifest/iterate the incident edges of a vertex to check their key/value
>>> pairs. In particular, I have index all the incident knows-edges by their
>>> weight property. Wanna know who marko knows well? Do this query:
>>> 
>>> …outE(‘knows’).has(‘weight’,gt(0.85)).inV()
>>> 
>>> { label:person, name:marko, age:29 | [outE] [inE] }#1
>>> // [outE]
>>> { weight:float? | [has,label,eq,?1] [inV] }#20
>>> // [has,label,eq,knows]
>>> { label:knows, weight:float? | [has,weight,?0,?1] [inV] }#15
>>> // [has,weight,gt,0.85]
>>> { label:knows, weight:gt(0.85) | [inV] }#15
>>> // [inV]
>>> { label:person }#15
>>> 
>>> See. I didn’t create single edge! I do know there are 20 outgoing edges
>>> from marko, but I didn’t manifest them. I then was able to jump to the
>>> adjacent vertices. If you want to know about those, you can iterate….
>>> 
>>> …label()
>>> 
>>> { person }#15
>>> 
>>> Haha. I don’t have to iterate to solve that. I know that all 15 adjacent
>>> vertices are labeled as ‘person’. I was able to go from v[1] to 15 person
>>> strings without manifesting any intermediate edges or vertices! I’m pretty
>>> freakin’ sweet. How do I know that you ask? I’m an in-memory graph database
>>> and my vertex-centric indices are just Java sets. Its cheap for me to
>>> provide counts, so I do. Most other providers can’t do that. But I can.
>>> 
>>> ——————TP4 WITH PROVIDER E——————
>>> 
>>> 
>>> …out(‘knows’).values(‘name’)
>>>     ==compiles to==>
>>> [outE][has,label,eq,knows][inV][values,name]
>>> 
>>> 
>>> { name:marko, age:29 | [outE] [inE] }#1
>>> // [outE]
>>> { [has,label,eq,?1] [inV] }#20
>>> // [has,label,eq,knows]
>>> { label:knows | [inV] }#15
>>> // [inV]
>>> { label:person | [values,name] }#15
>>> // [values,name]
>>> { type:string }#15
>>> 
>>> Did you see that? I didn’t manifest any incident edges nor adjacent
>>> vertices and I was able to give you the name of all the people that marko
>>> knows! Can you guess what features I have?
>>> 
>>>        * Incident edges are indexed by label.
>>>        * Certain properties of a vertex can be denormalized (stored
>>> locally) to their adjacent neighbors.
>>> 
>>> Thanks for reading,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
>>> <http://rredux.com/>>
>

Re: The Bytecode Pattern-Matching Model

Reply via email to