Hello Ran,
Thank you for detailing your work.
I was just looking over Apache Drill. That looks like a really cool
(complicated) project.
It sounds like Unipop has its work cut out for it. However, as you say, if you
can abstract away the database layer like Titan does, then you will be in luck.
I understand why you didn't choose Sqlg. Sqlg is starting with a "blank"
database and enforcing a graph schema into it. Unipop is starting with an
existing database and allowing the user to query it like a graph.
This could really help a lot of people in the area of master data management. I
have worked on many projects that have MongoDB, SQLServer, Cassandra,
Voldemort, etc. all under the same roof. If you could "just use Gremlin" to
manipulate that data, life would be much easier.
I have shared this before on this list, but perhaps you missed it. Stephen
Mallette wrote this article a while back that may provide some inspiration.
http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/
Anywho, keep up the good work and when you have things working, we can help you
promote your project.
Take care,
Marko.
http://markorodriguez.com
On Oct 28, 2015, at 3:43 PM, Ran Magen <[email protected]> wrote:
> Awesome, thanks Marko!
>
> Good point, I'll try to explain my reasoning behind not using Sqlg (even
> though it seems like a great project on its own). I'd be happy to receive
> any feedback on it.
>
> First, a bit about the motivation behind Unipop...
> Unipop is meant to be a DAL on top of any databases of your choice. The
> philosophy being that these days many organizations (as the one I work for)
> have alot of different "kinds" of data, spread throughout many specialized
> data stores (RDBMS / DocumentStore, etc.)
>
> What we wanted to make is a DAL that'll enable us to query all our
> different data stores and hundreds of different schemas, including the
> relationships between the data, in one simple interface.
>
> There are some projects that try to do the same thing (Drill
> <http://drill.apache.org/>, Calcite <https://calcite.incubator.apache.org/>,
> Dremel <http://research.google.com/pubs/pub36632.html>), but they use sql
> as the "unified" query language. We figured that in a schema with many
> connections, a property-graph representation would be better than a
> relational model (trying to avoid "JOIN hell"). So we decided to implement
> a Calcite-like application using gremlin - Unipop.
>
> On the issue of using Sqlg, There were a few design decisions we made in
> Unipop that seemed to go against it:
>
> 1. The graph Ontology should not be dependent on the underlying schemas.
> One could choose to represent a table in a database as a vertex, or as a
> vertex + edge (represented by some FK column). You might even choose to
> make a "virtual" vertex (let's say an 'email-address' vertex) that isn't
> represented anywhere physically, but is used as a connection-point between
> other vertices in our ontology (e.g. the user's posts, stored each as a
> document in elasticsearch). Basically, we shouldn't bind the design of our
> "user-facing" ontology with the design of our optimized data store schemas.
> - OTOH, in Sqlg the schema is (understandably) mapped directly to the
> graph ontology <http://umlg.org/sqlg.html> (take a look at the
> Architecture section.)
> 2. We must be able to query multiple different data stores in the
> same traversal, and even in the same step. Practically that meant that
> instead of implementing the process package (Steps, Strategies, etc.) for
> each data-store, we made one implementation that coordinates the different
> Controllers (elastic, jdbc, etc).
> - Before starting the work on the jdbc package I scanned through the
> sqlg code, and (again, understandably) the code seemed heavily
> dependent on
> the process package.
> 3. Translating gremlin's in/out steps to JOIN statments is a big pain.
> It's probably the hardest part about creating an sql implementation. We
> figured that for Unipop we'd just bypass that problem, create the JOINs we
> needed as views in the DB, and simply map those views to the vertices&edges
> to which they correspond in the graph ontology. (This explanation might not
> be too clear, I can expand on it if anyone's interested).
>
>
> The reason for going into these details is because I'd be happy to get a
> second opinion from you guys, about using Sqlg in particular, and about the
> design decisions in general.
>
> BTW, the same points are probably relevant in regards to using Titan's
> Cassandra/Hbase/etc connectors.
>
> Thanks,
> Ran
>
> On Tue, 27 Oct 2015 at 16:58 Marko Rodriguez <[email protected]> wrote:
>
>> Hi Ran,
>>
>> I just submitted a PR to your Unipop project.
>>
>> https://github.com/rmagen/unipop/pull/3
>>
>> However, while cruising around, I notice your unipop-jdbc/ package. Why
>> not just use Pieter Martin's Sqlg project for JDBC/TinkerPop?
>>
>> https://github.com/pietermartin/sqlg
>>
>> Perhaps I don't understand the purpose of your package⦠just a random
>> thought.
>>
>> Thanks,
>> Marko.
>>
>> http://markorodriguez.com
>>
>>