Re: Ran Magen's Unipop and Pieter Martin's Sqlg

Marko Rodriguez Wed, 28 Oct 2015 15:55:17 -0700

Also, if you don't like my logo pull request, you don't have to keep it :). It 
won't hurt my feelings.


Marko.

http://markorodriguez.com

On Oct 28, 2015, at 4:50 PM, Marko Rodriguez <[email protected]> wrote:

> Hello Ran,
> 
> Thank you for detailing your work. 
> 
> I was just looking over Apache Drill. That looks like a really cool 
> (complicated) project.
> 
> It sounds like Unipop has its work cut out for it. However, as you say, if 
> you can abstract away the database layer like Titan does, then you will be in 
> luck.
> 
> I understand why you didn't choose Sqlg. Sqlg is starting with a "blank" 
> database and enforcing a graph schema into it. Unipop is starting with an 
> existing database and allowing the user to query it like a graph.
> 
> This could really help a lot of people in the area of master data management. 
> I have worked on many projects that have MongoDB, SQLServer, Cassandra, 
> Voldemort, etc. all under the same roof. If you could "just use Gremlin" to 
> manipulate that data, life would be much easier.
> 
> I have shared this before on this list, but perhaps you missed it. Stephen 
> Mallette wrote this article a while back that may provide some inspiration.
>       
> http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/
> 
> Anywho, keep up the good work and when you have things working, we can help 
> you promote your project.
> 
> Take care,
> Marko.
> 
> http://markorodriguez.com
> 
> On Oct 28, 2015, at 3:43 PM, Ran Magen <[email protected]> wrote:
> 
>> Awesome, thanks Marko!
>> 
>> Good point, I'll try to explain my reasoning behind not using Sqlg (even
>> though it seems like a great project on its own). I'd be happy to receive
>> any feedback on it.
>> 
>> First, a bit about the motivation behind Unipop...
>> Unipop is meant to be a DAL on top of any databases of your choice. The
>> philosophy being that these days many organizations (as the one I work for)
>> have alot of different "kinds" of data, spread throughout many specialized
>> data stores (RDBMS / DocumentStore, etc.)
>> 
>> What we wanted to make is a DAL that'll enable us to query all our
>> different data stores and hundreds of different schemas, including the
>> relationships between the data, in one simple interface.
>> 
>> There are some projects that try to do the same thing (Drill
>> <http://drill.apache.org/>, Calcite <https://calcite.incubator.apache.org/>,
>> Dremel <http://research.google.com/pubs/pub36632.html>), but they use sql
>> as the "unified" query language. We figured that in a schema with many
>> connections, a property-graph representation would be better than a
>> relational model (trying to avoid "JOIN hell"). So we decided to implement
>> a Calcite-like application using gremlin - Unipop.
>> 
>> On the issue of using Sqlg, There were a few design decisions we made in
>> Unipop that seemed to go against it:
>> 
>>   1. The graph Ontology should not be dependent on the underlying schemas.
>>   One could choose to represent a table in a database as a vertex, or as a
>>   vertex + edge (represented by some FK column). You might even choose to
>>   make a "virtual" vertex (let's say an 'email-address' vertex) that isn't
>>   represented anywhere physically, but is used as a connection-point between
>>   other vertices in our ontology (e.g. the user's posts, stored each as a
>>   document in elasticsearch). Basically, we shouldn't bind the design of our
>>   "user-facing" ontology with the design of our optimized data store schemas.
>>   - OTOH, in Sqlg the schema is (understandably) mapped directly to the
>>      graph ontology <http://umlg.org/sqlg.html> (take a look at the
>>      Architecture section.)
>>      2. We must be able to query multiple different data stores in the
>>   same traversal, and even in the same step. Practically that meant that
>>   instead of implementing the process package (Steps, Strategies, etc.) for
>>   each data-store, we made one implementation that coordinates the different
>>   Controllers (elastic, jdbc, etc).
>>      - Before starting the work on the jdbc package I scanned through the
>>      sqlg code, and (again, understandably) the code seemed heavily
>> dependent on
>>      the process package.
>>   3. Translating gremlin's in/out steps to JOIN statments is a big pain.
>>   It's probably the hardest part about creating an sql implementation. We
>>   figured that for Unipop we'd just bypass that problem, create the JOINs we
>>   needed as views in the DB, and simply map those views to the vertices&edges
>>   to which they correspond in the graph ontology. (This explanation might not
>>   be too clear, I can expand on it if anyone's interested).
>> 
>> 
>> The reason for going into these details is because I'd be happy to get a
>> second opinion from you guys, about using Sqlg in particular, and about the
>> design decisions in general.
>> 
>> BTW, the same points are probably relevant in regards to using Titan's
>> Cassandra/Hbase/etc connectors.
>> 
>> Thanks,
>> Ran
>> 
>> On Tue, 27 Oct 2015 at 16:58 Marko Rodriguez <[email protected]> wrote:
>> 
>>> Hi Ran,
>>> 
>>> I just submitted a PR to your Unipop project.
>>> 
>>>        https://github.com/rmagen/unipop/pull/3
>>> 
>>> However, while cruising around, I notice your unipop-jdbc/ package. Why
>>> not just use Pieter Martin's Sqlg project for JDBC/TinkerPop?
>>> 
>>>        https://github.com/pietermartin/sqlg
>>> 
>>> Perhaps I don't understand the purpose of your package… just a random
>>> thought.
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> 
>

Re: Ran Magen's Unipop and Pieter Martin's Sqlg

Reply via email to