Re: elastic-gremlin

Ran Magen Mon, 18 May 2015 14:14:17 -0700

Hey Stephen,

ElasticGraph can be seen as an alternative to Titan - a big scaled-out
graph with indices (currentlywe we only have OLTP, but will add OLAP soon).
We're a company that started out a project using Titan, but it lacked some
capabilities we needed:

   - Speed, especially with regards to using text/number/geo indices. Our
   benchmarks showed that ES could function much faster than the performance
   we were getting from Titan.
   - Partitioning the data - useful for optimizing indexed queries on ES
   (Titan also uses ES, but doesn't include these optimizations). Plus, it
   allows you to manage the data for your specific needs. For example if you
   have a graph with real-time events coming in, and you want to periodically
   delete all the old events, you can partition the data by time.
   - The spatial capabilities didn't support all the features we needed.
   - Titan's future was in question

<http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/>
   .
   - And a bunch of other small issues.

We thought about contributing to Titan to add these capabilites, but
Titan's architecture (which separates the indexing backend from the "main"
store) made it difficult. Plus Titan has a big codebase supporting many
different BEs. At the end we figured it would just be simpler to implenet
TP directly on ES. It also sparse us from maintaining an extra
hbase/cassandra cluster.
We figured more people might have stumbled across these issues, so we're
sharing the code.

Numbers - we've gotten up to a few billions at this point in our tests, but
I'm pretty confident on its ability to scale further.

As for developing for TP, it's been mostly great :) The architecture is
very powerful, and gremlin 3 is turning out to be a great querying
language. And most importantly, it's fast to implement it.
The biggest issue I had was implementing custom steps. Apart from GraphStep
(which has a simple example in TinkerGraph), the other steps are pretty
hard to figure out. For example we implemented a VertexStep that batches up
traversers and their has steps to query them together, and had many issues
(I can elaborate if you want). We actually still have a pretty big issue
I'll raise in another thread.

The Test Suite is awesome! It would be practically impossible to implement
TP so fast and easily without it. Just a thought, it would be great if
failing tests would print some kind of "DEBUG" logs from the steps (or
something like the profile step's output), so it's easier to figure out
what step isn't working properly and why .

On Mon, 18 May 2015 at 21:23 Stephen Mallette <[email protected]> wrote:

> Thanks for sharing your project. Looks like you've implemented both the
> structure and process suites in ElasticGraph up to the latest M9 release
> candidate - very nice.
>
> Where would you say that this implementation fits?  Are there specific uses
> cases where you would want to use ElasticGraph over other implementations?
> When you say that "we're already using it with very big graphs" can you
> qualify that a bit (millions of edge, billions of edges, etc.)?
>
> Finally, more specifically related to TinkerPop, did you encounter any
> challenges in implementing the APIs or the Test Suite itself?
>
>
>
> On Mon, May 18, 2015 at 2:07 PM, Ran Magen <[email protected]> wrote:
>
> > Hey guys,
> > Just wanted to let you know about a TP3 implementation we're working on.
> > It's based on elastic-search, enabling very good scalability and indexing
> > capabilities.
> > You can find the code here <https://github.com/rmagen/elastic-gremlin>.
> >
> > This is still very much a work in progress (still more features and
> > optimizations planned, and some bugs to fix), but we're already using it
> > with very big graphs.
> >
> > I would appreciate any feedback!
> > Cheers,
> >
>

Re: elastic-gremlin

Reply via email to