Also, if you don't like my logo pull request, you don't have to keep it :). It won't hurt my feelings.
Marko. http://markorodriguez.com On Oct 28, 2015, at 4:50 PM, Marko Rodriguez <[email protected]> wrote: > Hello Ran, > > Thank you for detailing your work. > > I was just looking over Apache Drill. That looks like a really cool > (complicated) project. > > It sounds like Unipop has its work cut out for it. However, as you say, if > you can abstract away the database layer like Titan does, then you will be in > luck. > > I understand why you didn't choose Sqlg. Sqlg is starting with a "blank" > database and enforcing a graph schema into it. Unipop is starting with an > existing database and allowing the user to query it like a graph. > > This could really help a lot of people in the area of master data management. > I have worked on many projects that have MongoDB, SQLServer, Cassandra, > Voldemort, etc. all under the same roof. If you could "just use Gremlin" to > manipulate that data, life would be much easier. > > I have shared this before on this list, but perhaps you missed it. Stephen > Mallette wrote this article a while back that may provide some inspiration. > > http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/ > > Anywho, keep up the good work and when you have things working, we can help > you promote your project. > > Take care, > Marko. > > http://markorodriguez.com > > On Oct 28, 2015, at 3:43 PM, Ran Magen <[email protected]> wrote: > >> Awesome, thanks Marko! >> >> Good point, I'll try to explain my reasoning behind not using Sqlg (even >> though it seems like a great project on its own). I'd be happy to receive >> any feedback on it. >> >> First, a bit about the motivation behind Unipop... >> Unipop is meant to be a DAL on top of any databases of your choice. The >> philosophy being that these days many organizations (as the one I work for) >> have alot of different "kinds" of data, spread throughout many specialized >> data stores (RDBMS / DocumentStore, etc.) >> >> What we wanted to make is a DAL that'll enable us to query all our >> different data stores and hundreds of different schemas, including the >> relationships between the data, in one simple interface. >> >> There are some projects that try to do the same thing (Drill >> <http://drill.apache.org/>, Calcite <https://calcite.incubator.apache.org/>, >> Dremel <http://research.google.com/pubs/pub36632.html>), but they use sql >> as the "unified" query language. We figured that in a schema with many >> connections, a property-graph representation would be better than a >> relational model (trying to avoid "JOIN hell"). So we decided to implement >> a Calcite-like application using gremlin - Unipop. >> >> On the issue of using Sqlg, There were a few design decisions we made in >> Unipop that seemed to go against it: >> >> 1. The graph Ontology should not be dependent on the underlying schemas. >> One could choose to represent a table in a database as a vertex, or as a >> vertex + edge (represented by some FK column). You might even choose to >> make a "virtual" vertex (let's say an 'email-address' vertex) that isn't >> represented anywhere physically, but is used as a connection-point between >> other vertices in our ontology (e.g. the user's posts, stored each as a >> document in elasticsearch). Basically, we shouldn't bind the design of our >> "user-facing" ontology with the design of our optimized data store schemas. >> - OTOH, in Sqlg the schema is (understandably) mapped directly to the >> graph ontology <http://umlg.org/sqlg.html> (take a look at the >> Architecture section.) >> 2. We must be able to query multiple different data stores in the >> same traversal, and even in the same step. Practically that meant that >> instead of implementing the process package (Steps, Strategies, etc.) for >> each data-store, we made one implementation that coordinates the different >> Controllers (elastic, jdbc, etc). >> - Before starting the work on the jdbc package I scanned through the >> sqlg code, and (again, understandably) the code seemed heavily >> dependent on >> the process package. >> 3. Translating gremlin's in/out steps to JOIN statments is a big pain. >> It's probably the hardest part about creating an sql implementation. We >> figured that for Unipop we'd just bypass that problem, create the JOINs we >> needed as views in the DB, and simply map those views to the vertices&edges >> to which they correspond in the graph ontology. (This explanation might not >> be too clear, I can expand on it if anyone's interested). >> >> >> The reason for going into these details is because I'd be happy to get a >> second opinion from you guys, about using Sqlg in particular, and about the >> design decisions in general. >> >> BTW, the same points are probably relevant in regards to using Titan's >> Cassandra/Hbase/etc connectors. >> >> Thanks, >> Ran >> >> On Tue, 27 Oct 2015 at 16:58 Marko Rodriguez <[email protected]> wrote: >> >>> Hi Ran, >>> >>> I just submitted a PR to your Unipop project. >>> >>> https://github.com/rmagen/unipop/pull/3 >>> >>> However, while cruising around, I notice your unipop-jdbc/ package. Why >>> not just use Pieter Martin's Sqlg project for JDBC/TinkerPop? >>> >>> https://github.com/pietermartin/sqlg >>> >>> Perhaps I don't understand the purpose of your package⦠just a random >>> thought. >>> >>> Thanks, >>> Marko. >>> >>> http://markorodriguez.com >>> >>> >
