Re: A collection of examples that map a query language query to provider bytecode.

Dmitry Novikov Fri, 10 May 2019 07:27:57 -0700

Stephen, Remote Compiler - very interesting idea to explore. Just for 
brainstorming, let me imagine how this may look like:


Machine machine = RemoteMachine
    .withStructure(NeptuneStructure.class, config1)
    .withProcessor(AkkaProcessor.class, config2)
    .withCompiler(CypherCompiler.class, config3)
    .open(config0);

1. If the client supports compilation - compiles on the client side
2. If remote supports compilation - compiles on the server side
3. If neither client and remote support compilation, `config3` could contain 
the path to microservice.  Microservice does compilation and either return 
bytecode, either send bytecode to remote and proxy response to the client. 
Microservice could be deployed on remote as well.

`config3` may look like respectively:

1. `{compilation: 'embedded'}`
2. `{compilation: 'remote'}`
2. `{compilation: 'external', uri: 'localhost:3000/cypher'}`

On 2019/05/10 13:45:50, Stephen Mallette <spmalle...@gmail.com> wrote: 
> >  If VM, server or compiler is implemented in another language, there is
> always a possibility to use something like gRPC or even REST to call
> microservice that will do query→Universal Bytecode conversion.
> 
> That's an interesting way to handle it especially if it could be done in a
> completely transparent way - a Remote Compiler of some sort. If we had such
> a thing then the compilation could conceivably happen anywhere, client or
> server of the host programming language.
> 
> On Fri, May 10, 2019 at 9:08 AM Dmitry Novikov <dmitry.novi...@neueda.com>
> wrote:
> 
> > Hello,
> >
> > Marko, thank you for the clear explanation.
> >
> > > I don’t like that you would have to create a CypherCompiler class (even
> > if its just a wrapper) for all popular programming languages. :(
> >
> > Fully agree about this. For declarative languages like SQL, Cypher and
> > SPARQL complex compilation will be needed, most probably requiring AST
> > walk. Writing compilers for all popular languages could be possible in
> > theory, but increases the amount of work n times (where n>language count)
> > and complicates testing. Also, libraries necessary for the task might not
> > be available for all languages.
> >
> > In my opinion, to avoid the situation when the number of supported query
> > languages differs depending on client programming language, it is
> > preferable to introduce a plugin system. The server might have multiple
> > endpoints, one for Bytecode, one for SQL, Cypher, etc.
> >
> > If VM, server or compiler is implemented in another language, there is
> > always a possibility to use something like gRPC or even REST to call
> > microservice that will do query→Universal Bytecode conversion.
> >
> > Regards,
> > Dmitry
> >
> > On 2019/05/10 12:03:30, Stephen Mallette <spmalle...@gmail.com> wrote:
> > > >  I don’t like that you would have to create a CypherCompiler class
> > (even
> > > if its just a wrapper) for all popular programming languages. :(
> > >
> > > Yeah, this is the trouble I saw with sparql-gremlin and how to make it so
> > > that GLVs can support the g.sparql() step properly. It seems like no
> > matter
> > > what you do, you end up with a situation where the language designer has
> > to
> > > do something in each programming language they want to support. The bulk
> > of
> > > the work seems to be in the "compiler" so if that were moved to the
> > server
> > > (what we did in TP3) then the language designer would only have to write
> > > that once per VM they wanted to support and then provide a more
> > lightweight
> > > library for each programming language they supported on the client-side.
> > A
> > > programming language that had the full compiler implementation would have
> > > the advantage that they could client-side compile or rely on the server.
> > I
> > > suppose that a lightweight library would then become the basis for a
> > future
> > > full blown compiler in that language........hard one.
> > >
> > >
> > >
> > > On Thu, May 9, 2019 at 6:09 PM Marko Rodriguez <okramma...@gmail.com>
> > wrote:
> > >
> > > > Hello Dmitry,
> > > >
> > > > > In TP3 compilation to Bytecode can happen on Gremlin Client side or
> > > > Gremlin Server side:
> > > > >
> > > > > 1. If compilation is simple, it is possible to implement it for all
> > > > Gremlin Clients: Java, Python, JavaScript, .NET...
> > > > > 2. If compilation is complex, it is possible to create a plugin for
> > > > Gremlin Server. Clients send query string, and server does the
> > compilation.
> > > >
> > > > Yes, but not for the reasons you state. Every TP3-compliant language
> > must
> > > > be able to compile to TP3 bytecode. That bytecode is then submitted,
> > > > evaluated by the TP3 VM, and a traverser iterator is returned.
> > > >
> > > > However, TP3’s GremlinServer also supports JSR223 ScriptEngine which
> > can
> > > > compile query language Strings server side and then return a traverser
> > > > iterator. This exists so people can submit complex Groovy/Python/JS
> > scripts
> > > > to GremlinServer. The problem with this access point is that arbitrary
> > code
> > > > can be submitted and thus while(true) { } can hang the system! dar.
> > > >
> > > > > For example, in Cypher for Gremlin it is possible to use compilation
> > to
> > > > Bytecode in JVM client, or on the server when using [other language
> > > > clients][1].
> > > >
> > > > I’m not to familiar with GremlinServer plugin stuff, so I don’t know. I
> > > > would say that all TP3-compliant query languages must be able to
> > compile to
> > > > TP3 bytecode.
> > > >
> > > > > My current understanding is that TP4 Server would serve only for I/O
> > > > purposes.
> > > >
> > > > This is still up in the air, but I believe that we should:
> > > >
> > > >         1. Only support one data access point.
> > > >                 TP4 bytecode in and traversers out.
> > > >         2. The TP4 server should have two components.
> > > >                 (1) One (or many) bytecode input locations (IP/port)
> > that
> > > > pass the bytecode to the TP4 VM.
> > > >                 (2) Multiple traverser output locations where
> > distributed
> > > > processors can directly send halted traversers back to the client.
> > > >
> > > > For me, thats it. However, I’m not a network server-guy so I don’t
> > have a
> > > > clear understanding of what is absolutely necessary.
> > > >
> > > > > Where do you see "Query language -> Universal Bytecode" part in TP4
> > > > architecture? Will it be in the VM? Or in middleware? How will clients
> > look
> > > > like in TP4?
> > > >
> > > > TP4 will publish a binary serialization specification.
> > > > It will be dead simple compared to TP3’s binary specification.
> > > > The only types of objects are: Bytecode, Instruction, Traverser, Tuple,
> > > > and Primitive.
> > > >
> > > > Every query language designer that wants to have their query language
> > > > execute on the TP4 VM (and thus, against all supporting processing
> > engines
> > > > and data storage systems) will need to have a compiler from their
> > language
> > > > to TP4 bytecode.
> > > >
> > > > We will provide 2 tools in all the popular programming languages (Java,
> > > > Python, JS, …).
> > > >         1. A TP4 serializer and deserializer.
> > > >         2. A lightweight network client to submit serialized bytecode
> > and
> > > > deserialize Iterator<Traverser> into objects in that language.
> > > >
> > > > Thus, if the Cypher-TP4 compiler is written in Scala, you would:
> > > >         1. build up a org.apache.tinkerpop.machine.bytecode.Bytecode
> > > > object during your compilation process.
> > > >         2. use our org.apache.tinkerpop.machine.io <
> > > > http://org.apache.tinkerpop.machine.io/>.RemoteMachine object to send
> > the
> > > > Bytecode and get back Iterator<Traverser> objects.
> > > >                 - RemoteMachine does the serialization and
> > deserialization
> > > > for you.
> > > >
> > > > I originally wrote out how it currently looks in the tp4/ branch, but
> > > > realized that it asks you to write one too many classes. Thus, I think
> > we
> > > > will probably go with something like this:
> > > >
> > > > Machine machine = RemoteMachine.
> > > >                     withStructure(NeptuneStructure.class, config1).
> > > >                     withProcessor(AkkaProcessor.class, config2).
> > > >                       open(config0);
> > > >
> > > > Iterator<Traverser> results =
> > machine.submit(CypherCompiler.compile("MATCH
> > > > (x)-[knows]->(y)”));
> > > >
> > > > Thus, you would only have to provide a single CypherCompiler class.
> > > >
> > > > If you have any better ideas, please say so. I don’t like that you
> > would
> > > > have to create a CypherCompiler class (even if its just a wrapper) for
> > all
> > > > popular programming languages. :(
> > > >
> > > > Perhaps TP4 has a Compiler interface and compilation happens server
> > > > side….? But then that requires language designers to write their
> > compiler
> > > > in Java … hmm…..
> > > >
> > > > Hope I’m clear,
> > > > Marko.
> > > >
> > > > http://rredux.com <http://rredux.com/>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: A collection of examples that map a query language query to provider bytecode.

Reply via email to