Hi Stephen and All,
Thanks for going through the code. I address your questions below (in the same
order):
1. Yes, this file can be removed. It was just to test the traversal method.
2. Yes, I have commented out the block of tests at this moment since we do not
need to run tests at mvn clean install time. However, I kept it (in commented
out form) if there arose a need in future for the same. It can surely be
removed if you think, it won't be necessary.
3. There were two testing units (we continued them from Daniel's version), one
to check whether the prefixes are being encoded correctly, the second one is to
test whether the generated traversal is correct (in short the compiler is
functioning as it should). Since, we extended previous work supporting a
variety of SPARQL operators, more test cases can be added to validate that each
of these is functioning as expected. However, as I mentioned in point #2. we
need not do it explicitly as we (Dharmen and I) have already tested them on 3-4
different datasets and query-sets. Now, since we did not know if that was going
to be formally required in the future or not, we left them as it is, just
commented it out.
4. These resources are the graphml files that we wish to provide the users, for
(i) loading and querying famous datasets - the Berlin SPARQL Benchmark (BSBM)
(famous in the Semantic Web-RDF community) so that they do not have to look
elsewhere for the same. (ii) Also, it provides a strong use-case for
demonstrating the applicability of sparql-gremlin (creates trust in the SW
community users) and (iii) to keep the plug-in pretty much self-dependent.
5 & 6 YES, damn it. The IDE did this. I will revert these changes. It's like
when you are not looking, the IDE does things on it own :-/ apologies!
7. Regarding, Marko's thoughts -- Yes, I was waiting for you to reply to the
thread. I do have some thoughts on this. But first, I was wondering if this
(what Marko suggested) is supposed to be entirely implemented in the current
version of sparql-gremlin 0.2, i.e. including the withStrategies() and
withStrategies() and remote() features, or it is to be supported eventually
(after the sparql-gremlin 0.2.0) plugin is rolled out. Also, I am not entirely
sure I got what Marko was exactly suggesting. I bring this to light in the
in-line style reply to Marko's comment later here.
The current implementation is more of a typical compiler, the users, however,
can use it by specifying the query file and the dataset against which it is to
be executed via the command (once in the gremlin shell):
gremlin> graph = TinkerGraph.open(..)
gremlin> SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT ?a
WHERE {....} ")
==>{?x:marko, ?y:29}
==>{?x:josh, ?y:32}
i.e. load a graph using pre-defined tinkerpop methods (
graph.io(IoCore.gryo()).readGraph(graphName), TinkerGraph.open(), etc ) , then
execute the traversal as above with arguments -- (graph, queryString), where
queryString = "SPARQL query".
Now Let me quote Marko's comment and reply in-line to bring more clarity:
1. There should be a SPARQLTraversalSource which supports one spawn method â
query(String).
This is already happening inside the code. Therefore, we do not need to
mention it explicitly. Please correct me if I got it wrong here.
2. SPARQLTraversal is spawned and it only supports only the Traversal methods
â next(), toList(), iterate(), etc.
All traversal methods that are supported, available to a regular
gremlin traversal, can be used by the sparql-gremlin compiler generated
traversal as well.
3. query(String) adds a ConstantStep(String).
This is happening internally (as shown in the example above), we
can also make explicit. i.e. let the user only provide the queryString instead
of the whole "SparqlToGremlinCompiler.convertToGremlinTraversal(graph, "SELECT
?a WHERE {....} ")" command. Does this make sense? or am I missing something
here.
4. SPARQLTraversalSource has a registered SPARQLStrategy.
At this moment, we leave it to the default setting for this strategy
selection.
5. SPARQLTraversalSource should also support withStrategies(),
withoutStrategies(), withRemote(), etc.
Once the traversal is generated, it can support all strategies like any
other gremlin traversal. Does this make sense to you?
In a nutshell,
What is happening is that we are converting the SPARQL queryString into a
gremlin traversal and leave it upto the tinkerpop compiler to choose what is
best for it.
We only map a SPARQL query to its corresponding pattern matching gremlin
traversal (i.e. using with .match() clause). Since, the expressibility of
SPARQL is less than that of Gremlin (i.e. SPARQL 1.0 doesn't support/allow
performing looping and traversing operations), we can only map what is in the
scope of SPARQL language to Gremlin. Once the traversal is generated, it is
left to the tinkerpop compiler to select and execute a wide range of strategies
( various levels of optimizations, et).
NOTE - Also, Right now the sparql-gremlin compiler returns the traversal
(string) and not Bytecode. Returning the Bytecode is also completely possible,
if you want so. We can just perform traversal.asAdmin().getBytecode() for this
and it is done.
Since, we extended Daniel's work, we have not changed the names of classes,
methods and variable which were used. This, however, can be changed, if you
suggest so.
8. Yes, working in the academia doesn't groom you much on the importance of
commenting in the code by default, or for that matter any "good-practices". I
will add appropriate comment block in each class for the javadocs.
I hope the above reply address your questions to quite some extent. Most of
the issues are already handled internally (as I stated above). We can also
leave some advanced features such as remote(), for the 0.2.1 release (though
this is just an option) :D
Having said that, Of course, we are in no hurry for the pull request. I also
believe it makes complete sense to get things right.
Cheers!
On 2017-12-18 14:11, Stephen Mallette <[email protected]> wrote:
> Harsh, I looked at the code in a bit more detail than I have. Here's some
> thoughts/questions I had as I was going through things:
>
> 1. Can this file be removed - it doesn't appear to have any usage that I
> can see:
>
> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/Runable.java
>
> 2. I note that this entire block of tests is commented out - should that be
> removed?:
>
> https://github.com/harsh9t/tinkerpop/blob/master/sparql-gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/SparqlToGremlinCompilerTest.java
>
> 3. I could be wrong, but even if you didn't remove the tests above, it
> seems like unit testing is rather thin at this point. Am I missing
> something? Is there more work to do there?
>
> 4. I don't understand the nature of these resources:
>
> https://github.com/harsh9t/tinkerpop/tree/master/sparql-gremlin/src/main/resources
>
> Is there any need to package those with the jar? Should those be "test"
> resources instead? Do we need the really large data/bsbm1m.graphml file for
> any specific reason?
>
> 5. What are these changes to these poms?
>
> https://github.com/harsh9t/tinkerpop/commit/cb3b6512ea3536f556108e5a257c4586aa4d157a
>
> I assume that your IDE did that accidentally and it was not intended.
> Please revert that change.
>
> 6. This looks odd too - gremlin-shaded repeated again and again and again:
>
> https://github.com/harsh9t/tinkerpop/commit/143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-9e90009f097eabeb25c28159571fc6a2R118
>
> 7. Did you have any thoughts in reference to Marko's earlier reply that
> described how sparql-gremlin should be used? Right now, it seems like the
> code you have there is just the "engine" but lacks the piece that connects
> it into the rest of the stack. From my perspective, I think we need to be
> sure that users have an easy, clear and consistent way to use
> sparql-gremlin before we can merge this work. Obviously, having that aspect
> of the code thought through will impact the documentation that you write as
> well, so I think you need to go down this path a bit further before we get
> to the pull request stage.
>
> 8. We aren't big javadoc sticklers here, but we try to at least get class
> level javadoc in place for most classes. I don't see much javadoc or
> comments in the code right now. I think I'd like to see a modicum of
> javadoc/comments present as part of this work.
>
> So, that's my broad level feedback at this point. It seems as though there
> are some reasonably large issues there to contend with before a pull
> request is worth issuing. That's not a problem, of course....we will just
> keep iterating toward the goal. I'm not aware of anything that is pushing
> us to rush to a pull request - I'm of the opinion that we can take the time
> to get this right.
>
> Thanks,
>
> Stephen
>
>
> On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <[email protected]> wrote:
>
> > Hi Marko,
> >
> > I think we're more or less on the same page here; it's clear that TP3 has a
> > different API than TP2. If you look at the guts of TP3 GraphSail [1], it
> > uses the modern APIs, and yet does adapt them to the Sail interface.
> >
> > Something like PropertyGraphSail (or an equivalent Jena thing) still makes
> > sense in TP3, as well. One interesting detail here is that in TP3, vertices
> > can have labels, which can be turned into rdf:type statements (that, in
> > turn, can be used to enable subclass/superclass inheritance if the graph is
> > combined with a RDF schema.
> >
> > A TP3 equivalent of SailGraph would indeed be quite different in
> > implementation -- strategies, not wrapper graph -- than what we had for
> > Blueprints, and yet would serve the same purpose.
> >
> > Josh
> >
> >
> > [1]
> > https://github.com/joshsh/graphsail/tree/master/src/
> > main/java/net/fortytwo/tpop/sail
> >
> >
> >
> > On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <[email protected]>
> > wrote:
> >
> > > Hello,
> > >
> > > The model proposed below is in-line with TinkerPop2âs way of thinking.
> > > Unfortunately, TinkerPop3 and more so for TinkerPop4, the Graph
> > âstructure"
> > > API will become deprecated. This means that the notion of âwrapping the
> > > Graph APIâ has gone away for TP3 and will be completely gone in TP4. In
> > > TP4, there will not even be a Graph API â no more Vertex, Edge,
> > > Property,
> > > etc. Only the concept of a Graph with only methods like
> > Graph.traversal(),
> > > Graph.partitions(), etc.
> > >
> > > Why was this route taken? In TinkerPop3, there was a need to support any
> > > language besides Java. This was why Gremlin bytecode and the concept of
> > the
> > > Gremlin traversal machine was introduced. A provider simply gets Gremlin
> > > bytecode and has to do something with it. For the Java-based Gremlin
> > > traversal machine, this is why providers implement their own GraphStep,
> > > VertexStep, etc. For a Python-based Gremlin traversal machine, likewiseâ¦
> > >
> > > This means that SailGraph, GraphSail, PropertyGraphSail as stated below
> > > donât make sense in the current and future architectures.
> > >
> > > The next question becomes, "well how would you turn an RDF store into a
> > > PropertyGraph?â Easy â implement your own custom GraphStep,
> > > VertexStep,
> > > etc. and respective ProviderStrategies that will handle the bytecode
> > > compilation accordingly.
> > >
> > > The next question becomes, âwell how would a PropertyGraph support
> > > reasoning?â Easy â implement your own custom DecorationStrategy that
> > > will
> > > insert reasoning into the traversal giving the RDFS schema. For instance:
> > > g.V().out(âlikesâ)
> > > ==>
> > > g.V().out(âknowsâ,âlikesâ)
> > > iff âlikesâ is a sub-property of âknowsâ
> > >
> > > In essence, it is possible to do this integration of RDF and TinkerPop,
> > it
> > > just needs to be done at the correct level of abstraction so that it
> > stays
> > > in line with how TinkerPop is evolving, not how it was back in 2012.
> > >
> > > Take care,
> > > Marko.
> > >
> > > http://markorodriguez <http://markorodriguez/>.com
> > >
> > >
> > > On 2017-12-13 07:46, Joshua Shinavier <[email protected]> wrote:
> > > > Hi Harsh,>
> > > >
> > > > Glad you are taking Daniel's work forward. In porting the code to the>
> > > > TinkerPop code base, might I suggest we allow for not only
> > > SPARQL-Gremlin,>
> > > > but a whole suite of RDF tools as in TP2. Perhaps call the module>
> > > > rdf-gremlin. Then we could have all of:>
> > > >
> > > > * SPARQL-Gremlin: executes standard SPARQL queries over a Property
> > Graph>
> > > > database>
> > > > * GraphSail [1,2]: stores RDF quads in the database, explicitly, and>
> > > > enables SPARQL and triple pattern queries over the quads>
> > > > * PropertyGraphSail [3]: exposes a Property Graph with of two mappings
> > > to>
> > > > the RDF data model>
> > > > * SailGraph [4]: takes an RDF triple store (not natively supporting>
> > > > Gremlin) and enables Gremlin queries>
> > > > * others? I have often thought that a continuous SPARQL implementation>
> > > > built on Gremlin would be powerful>
> > > >
> > > > The biggest mismatch between the TP2 suite and what might be built for>
> > > > Apache TinkerPop is that the previous suite was implemented using
> > > (Eclipse)>
> > > > RDF4j, whereas things seem to be leaning towards (Apache) Jena now.>
> > > > However, the same principles could be applied.>
> > > >
> > > > Josh>
> > > >
> > > >
> > > > [1] https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation>
> > > > [2] https://github.com/joshsh/graphsail>
> > > > [3]>
> > > > https://github.com/tinkerpop/blueprints/wiki/PropertyGraphSail-
> > > Ouplementation>
> > > > [4] https://github.com/tinkerpop/blueprints/wiki/Sail-Implementation
> > >
> > > http://markorodriguez.com
> > >
> > >
> > >
> > >
> >
>