Re: elastic-gremlin

Stephen Mallette Mon, 18 May 2015 15:38:16 -0700

Thanks for sharing all that additional information.

> The biggest issue I had was implementing custom steps.


I think we have a bit of a hole in the docs around that kinda of stuff at
the moment.  You have to be careful with custom steps because the
TraversalStrategy implementations might not behave nicely if they come
across steps they don't know about.  We've been trying to understand the
right set of recommendations to give around that issue which is most of the
reason we probably don't have docs developed yet.  If you'd like to
elaborate as you offered, we'd be interested in hearing about your issues.

> The Test Suite is awesome!

That is excellent to hear.  Not many people have to interact with the test
suite directly but it is super critical part of the TinkerPop Ecosystem -
if those who have to use is aren't satisfied with it, I'd consider that a
big problem.

> Just a thought, it would be great if failing tests would print some kind
of "DEBUG" logs from the steps (or something like the profile step's
output), so it's easier to figure out what step isn't working properly and
why .

Still trying to figure that out (i.e. what's the most useful way to "DEBUG"
things).  We don't do logging in gremlin-core so there isn't much to output
there.  I'm hoping that this ticket will be useful in this area:

https://issues.apache.org/jira/browse/TINKERPOP3-679

I did give a look at your implementation code.  I noticed that you only had
to @OptOut of a couple of tests - not bad, though I'm not sure how much of
the test suite fires under your ElasticFeatures implementation.  We tried
to write tests to allow maximum coverage given the most common feature set
- hopefully you receive good coverage under that model.  Can you share what
percentage of the tests fire for you given ElasticFeatures?

Speaking of ElasticFeatures, you might want to make this a static reference:

https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68

and try to generally reduce anonymous object creation within
ElasticFeatures itself.  You don't want to create a new instance of that
stuff for every feature check - we do a internal feature checking in
different part of the stack and it could create a lot
of unnecessary objects for you.




On Mon, May 18, 2015 at 5:13 PM, Ran Magen <[email protected]> wrote:

> Hey Stephen,
>
> ElasticGraph can be seen as an alternative to Titan - a big scaled-out
> graph with indices (currentlywe we only have OLTP, but will add OLAP soon).
> We're a company that started out a project using Titan, but it lacked some
> capabilities we needed:
>
>    - Speed, especially with regards to using text/number/geo indices. Our
>    benchmarks showed that ES could function much faster than the
> performance
>    we were getting from Titan.
>    - Partitioning the data - useful for optimizing indexed queries on ES
>    (Titan also uses ES, but doesn't include these optimizations). Plus, it
>    allows you to manage the data for your specific needs. For example if
> you
>    have a graph with real-time events coming in, and you want to
> periodically
>    delete all the old events, you can partition the data by time.
>    - The spatial capabilities didn't support all the features we needed.
>    - Titan's future was in question
>    <
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> >
>    .
>    - And a bunch of other small issues.
>
> We thought about contributing to Titan to add these capabilites, but
> Titan's architecture (which separates the indexing backend from the "main"
> store) made it difficult. Plus Titan has a big codebase supporting many
> different BEs. At the end we figured it would just be simpler to implenet
> TP directly on ES. It also sparse us from maintaining an extra
> hbase/cassandra cluster.
> We figured more people might have stumbled across these issues, so we're
> sharing the code.
>
> Numbers - we've gotten up to a few billions at this point in our tests, but
> I'm pretty confident on its ability to scale further.
>
> As for developing for TP, it's been mostly great :) The architecture is
> very powerful, and gremlin 3 is turning out to be a great querying
> language. And most importantly, it's fast to implement it.
> The biggest issue I had was implementing custom steps. Apart from GraphStep
> (which has a simple example in TinkerGraph), the other steps are pretty
> hard to figure out. For example we implemented a VertexStep that batches up
> traversers and their has steps to query them together, and had many issues
> (I can elaborate if you want). We actually still have a pretty big issue
> I'll raise in another thread.
>
> The Test Suite is awesome! It would be practically impossible to implement
> TP so fast and easily without it. Just a thought, it would be great if
> failing tests would print some kind of "DEBUG" logs from the steps (or
> something like the profile step's output), so it's easier to figure out
> what step isn't working properly and why .
>
>
>
> On Mon, 18 May 2015 at 21:23 Stephen Mallette <[email protected]>
> wrote:
>
> > Thanks for sharing your project. Looks like you've implemented both the
> > structure and process suites in ElasticGraph up to the latest M9 release
> > candidate - very nice.
> >
> > Where would you say that this implementation fits?  Are there specific
> uses
> > cases where you would want to use ElasticGraph over other
> implementations?
> > When you say that "we're already using it with very big graphs" can you
> > qualify that a bit (millions of edge, billions of edges, etc.)?
> >
> > Finally, more specifically related to TinkerPop, did you encounter any
> > challenges in implementing the APIs or the Test Suite itself?
> >
> >
> >
> > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <[email protected]> wrote:
> >
> > > Hey guys,
> > > Just wanted to let you know about a TP3 implementation we're working
> on.
> > > It's based on elastic-search, enabling very good scalability and
> indexing
> > > capabilities.
> > > You can find the code here <https://github.com/rmagen/elastic-gremlin
> >.
> > >
> > > This is still very much a work in progress (still more features and
> > > optimizations planned, and some bugs to fix), but we're already using
> it
> > > with very big graphs.
> > >
> > > I would appreciate any feedback!
> > > Cheers,
> > >
> >
>

Re: elastic-gremlin

Reply via email to