I am just reading this older thread about OpenCypher / SPARQL on the archives. Very interesting.
My 2 cents - Tinkerpop is a great API that makes graph application development much easier, but the lack of a declarative query language is a barrier to making those applications scale. I strongly prefer to develop application code using Tinkerpop over raw RDF or Sesame, but once the data is there I prefer to access and update it via SPARQL. I am of course biased but I think trying to bring SPARQL more directly into the fold would be a good thing. I did it with our TP2 integration and I plan to do it again with TP3. New users drawn into the RDF world through the TP API end up replacing a lot of custom code and stored procedures with SPARQL queries, with which you can do a lot of very powerful things. I'd love to find some time to write a compiler to compile gremlin traversals into SPARQL operators directly, instead of re-formulating traversals by hand into SPARQL. Once in SPARQL form a query optimizer / vectored query engine can decide on an optimal execution order based on the cardinalities it finds in the graph, instead of a fixed execution order specified by the user. This type of re-write is of course fundamental to scaling. I've spent a fair amount of time working with Cypher at this point and my (again, biased) conclusion is that Neo4j is re-inventing the wheel and Cypher is still many years behind SPARQL 1.1 in terms of its capabilities and its scalability implications for implementators. There is already a W3C standard query language with a wide user-base, why not use it? It is also possible to develop a property-graph DSL on top of SPARQL that inter-operates with the Tinkerpop API and data model. I have done this in a proprietary setting already and my goal is to eventually bring it into open-source. Thanks, Mike Blazegraph Core Development Team
