Re: A collection of examples that map a query language query to provider bytecode.

Marko Rodriguez Sun, 12 May 2019 09:46:12 -0700

Hi,

> Machine machine = RemoteMachine
>    .withStructure(NeptuneStructure.class, config1)
>    .withProcessor(AkkaProcessor.class, config2)
>    .withCompiler(CypherCompiler.class, config3)
>    .open(config0);



Yea, I think something like this would work well. 

I like it because it exposes the three main components that TinkerPop is gluing 
together:

        Language
        Structure
        Process

Thus, I would have it:

        withStructure()
        withProcessor()
        withLanguage()

Marko.

http://rredux.com <http://rredux.com/>


> On May 10, 2019, at 8:27 AM, Dmitry Novikov <dmitry.novi...@neueda.com> wrote:
> 
> Stephen, Remote Compiler - very interesting idea to explore. Just for 
> brainstorming, let me imagine how this may look like:
> 
> 
> 1. If the client supports compilation - compiles on the client side
> 2. If remote supports compilation - compiles on the server side
> 3. If neither client and remote support compilation, `config3` could contain 
> the path to microservice.  Microservice does compilation and either return 
> bytecode, either send bytecode to remote and proxy response to the client. 
> Microservice could be deployed on remote as well.
> 
> `config3` may look like respectively:
> 
> 1. `{compilation: 'embedded'}`
> 2. `{compilation: 'remote'}`
> 2. `{compilation: 'external', uri: 'localhost:3000/cypher'}`
> 
> On 2019/05/10 13:45:50, Stephen Mallette <spmalle...@gmail.com> wrote: 
>>> If VM, server or compiler is implemented in another language, there is
>> always a possibility to use something like gRPC or even REST to call
>> microservice that will do query→Universal Bytecode conversion.
>> 
>> That's an interesting way to handle it especially if it could be done in a
>> completely transparent way - a Remote Compiler of some sort. If we had such
>> a thing then the compilation could conceivably happen anywhere, client or
>> server of the host programming language.
>> 
>> On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov <dmitry.novi...@neueda.com>
>> wrote:
>> 
>>> Hello,
>>> 
>>> Marko, thank you for the clear explanation.
>>> 
>>>> I don’t like that you would have to create a CypherCompiler class (even
>>> if its just a wrapper) for all popular programming languages. :(
>>> 
>>> Fully agree about this. For declarative languages like SQL, Cypher and
>>> SPARQL complex compilation will be needed, most probably requiring AST
>>> walk. Writing compilers for all popular languages could be possible in
>>> theory, but increases the amount of work n times (where n>language count)
>>> and complicates testing. Also, libraries necessary for the task might not
>>> be available for all languages.
>>> 
>>> In my opinion, to avoid the situation when the number of supported query
>>> languages differs depending on client programming language, it is
>>> preferable to introduce a plugin system. The server might have multiple
>>> endpoints, one for Bytecode, one for SQL, Cypher, etc.
>>> 
>>> If VM, server or compiler is implemented in another language, there is
>>> always a possibility to use something like gRPC or even REST to call
>>> microservice that will do query→Universal Bytecode conversion.
>>> 
>>> Regards,
>>> Dmitry
>>> 
>>> On 2019/05/10 12:03:30, Stephen Mallette <spmalle...@gmail.com> wrote:
>>>>> I don’t like that you would have to create a CypherCompiler class
>>> (even
>>>> if its just a wrapper) for all popular programming languages. :(
>>>> 
>>>> Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
>>>> that GLVs can support the g.sparql() step properly. It seems like no
>>> matter
>>>> what you do, you end up with a situation where the language designer has
>>> to
>>>> do something in each programming language they want to support. The bulk
>>> of
>>>> the work seems to be in the "compiler" so if that were moved to the
>>> server
>>>> (what we did in TP3) then the language designer would only have to write
>>>> that once per VM they wanted to support and then provide a more
>>> lightweight
>>>> library for each programming language they supported on the client-side.
>>> A
>>>> programming language that had the full compiler implementation would have
>>>> the advantage that they could client-side compile or rely on the server.
>>> I
>>>> suppose that a lightweight library would then become the basis for a
>>> future
>>>> full blown compiler in that language........hard one.
>>>> 
>>>> 
>>>> 
>>>> On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez <okramma...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hello Dmitry,
>>>>> 
>>>>>> In TP3 compilation to Bytecode can happen on Gremlin Client side or
>>>>> Gremlin Server side:
>>>>>> 
>>>>>> 1. If compilation is simple, it is possible to implement it for all
>>>>> Gremlin Clients: Java, Python, JavaScript, .NET...
>>>>>> 2. If compilation is complex, it is possible to create a plugin for
>>>>> Gremlin Server. Clients send query string, and server does the
>>> compilation.
>>>>> 
>>>>> Yes, but not for the reasons you state. Every TP3-compliant language
>>> must
>>>>> be able to compile to TP3 bytecode. That bytecode is then submitted,
>>>>> evaluated by the TP3 VM, and a traverser iterator is returned.
>>>>> 
>>>>> However, TP3’s GremlinServer also supports JSR223 ScriptEngine which
>>> can
>>>>> compile query language Strings server side and then return a traverser
>>>>> iterator. This exists so people can submit complex Groovy/Python/JS
>>> scripts
>>>>> to GremlinServer. The problem with this access point is that arbitrary
>>> code
>>>>> can be submitted and thus while(true) { } can hang the system! dar.
>>>>> 
>>>>>> For example, in Cypher for Gremlin it is possible to use compilation
>>> to
>>>>> Bytecode in JVM client, or on the server when using [other language
>>>>> clients][1].
>>>>> 
>>>>> I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I
>>>>> would say that all TP3-compliant query languages must be able to
>>> compile to
>>>>> TP3 bytecode.
>>>>> 
>>>>>> My current understanding is that TP4 Server would serve only for I/O
>>>>> purposes.
>>>>> 
>>>>> This is still up in the air, but I believe that we should:
>>>>> 
>>>>>        1. Only support one data access point.
>>>>>                TP4 bytecode in and traversers out.
>>>>>        2. The TP4 server should have two components.
>>>>>                (1) One (or many) bytecode input locations (IP/port)
>>> that
>>>>> pass the bytecode to the TP4 VM.
>>>>>                (2) Multiple traverser output locations where
>>> distributed
>>>>> processors can directly send halted traversers back to the client.
>>>>> 
>>>>> For me, thats it. However, I’m not a network server-guy so I don’t
>>> have a
>>>>> clear understanding of what is absolutely necessary.
>>>>> 
>>>>>> Where do you see "Query language -> Universal Bytecode" part in TP4
>>>>> architecture? Will it be in the VM? Or in middleware? How will clients
>>> look
>>>>> like in TP4?
>>>>> 
>>>>> TP4 will publish a binary serialization specification.
>>>>> It will be dead simple compared to TP3’s binary specification.
>>>>> The only types of objects are: Bytecode, Instruction, Traverser, Tuple,
>>>>> and Primitive.
>>>>> 
>>>>> Every query language designer that wants to have their query language
>>>>> execute on the TP4 VM (and thus, against all supporting processing
>>> engines
>>>>> and data storage systems) will need to have a compiler from their
>>> language
>>>>> to TP4 bytecode.
>>>>> 
>>>>> We will provide 2 tools in all the popular programming languages (Java,
>>>>> Python, JS, …).
>>>>>        1. A TP4 serializer and deserializer.
>>>>>        2. A lightweight network client to submit serialized bytecode
>>> and
>>>>> deserialize Iterator<Traverser> into objects in that language.
>>>>> 
>>>>> Thus, if the Cypher-TP4 compiler is written in Scala, you would:
>>>>>        1. build up a org.apache.tinkerpop.machine.bytecode.Bytecode
>>>>> object during your compilation process.
>>>>>        2. use our org.apache.tinkerpop.machine.io <
>>>>> http://org.apache.tinkerpop.machine.io/>.RemoteMachine object to send
>>> the
>>>>> Bytecode and get back Iterator<Traverser> objects.
>>>>>                - RemoteMachine does the serialization and
>>> deserialization
>>>>> for you.
>>>>> 
>>>>> I originally wrote out how it currently looks in the tp4/ branch, but
>>>>> realized that it asks you to write one too many classes. Thus, I think
>>> we
>>>>> will probably go with something like this:
>>>>> 
>>>>> Machine machine = RemoteMachine.
>>>>>                    withStructure(NeptuneStructure.class, config1).
>>>>>                    withProcessor(AkkaProcessor.class, config2).
>>>>>                      open(config0);
>>>>> 
>>>>> Iterator<Traverser> results =
>>> machine.submit(CypherCompiler.compile("MATCH
>>>>> (x)-[knows]->(y)”));
>>>>> 
>>>>> Thus, you would only have to provide a single CypherCompiler class.
>>>>> 
>>>>> If you have any better ideas, please say so. I don’t like that you
>>> would
>>>>> have to create a CypherCompiler class (even if its just a wrapper) for
>>> all
>>>>> popular programming languages. :(
>>>>> 
>>>>> Perhaps TP4 has a Compiler interface and compilation happens server
>>>>> side….? But then that requires language designers to write their
>>> compiler
>>>>> in Java … hmm…..
>>>>> 
>>>>> Hope I’m clear,
>>>>> Marko.
>>>>> 
>>>>> http://rredux.com <http://rredux.com/>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: A collection of examples that map a query language query to provider bytecode.

Reply via email to