Re: elastic-gremlin

Ran Magen Wed, 20 May 2015 08:09:53 -0700

> percentage of the tests fire for you given ElasticFeatures?

ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320 passed
ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, 321
ignored, 394 passed
The Process coverage seems good. I believe most of the failures are due to
the fact that I only support string IDs (I think not all tests call the
convertId method). And some new stuff in M9 that I haven't gotten around to
fixing yet. But I'll make sure and open tickets for anything I find.
It would also be great if we could easily run specific tests or classes
using junit. at the moment its cumbersome to run a class of tests
(updateing the environment variable each time), and impossible to debug a
specific test easily (or at least I haven't found a way).


> we'd be interested in hearing about your issues.

   1. We made a custom VertexStep that aggregates traversers, and has
   steps, to minimize the amount of queries issued. It messed up a few things,
   but we got the basic usage working in M9 (guess you fixed some stuff for
   Titan, which do the same thing). The problem now is that it doesn't work on
   inner traversals. For example, Repeat gives out only 1 traverser every
   time. Do you have any suggestions? Am I doing something wrong?
   2. We want to implement a validation strategy. Sort of like
   EventStrategy, but it will notify before a mutation, and will enable the
   user's validation code to cancel a mutation if it doesn't pass its checks.
   The problem is that there are no "before" callbacks for the Mutating
   interface. We also thought the strategy could just add a validation step
   before each mutating step, but that had its own issues. Also, the
   validation strategy won't work on stuff like graph.addVertex(), but I guess
   we can make sure people only use the traversal.
   3. Adding in bulk - we added our own functions for bulk inserts, since
   we didn't find anything to support it in the API. The thing is we need this
   ability as part of the traversal, so we can utilize the validation strategy
   (if we can get that working). We thought about inheriting from the Add
   steps, but they're final. It'd be great to have somting like
   __.inject(vertices).as('x').addV('x'), and have the ability to make it bulk
   load the vertices.

Thank you for your help!


On Tue, 19 May 2015 at 01:37 Stephen Mallette <[email protected]> wrote:

> Thanks for sharing all that additional information.
>
> > The biggest issue I had was implementing custom steps.
>
> I think we have a bit of a hole in the docs around that kinda of stuff at
> the moment.  You have to be careful with custom steps because the
> TraversalStrategy implementations might not behave nicely if they come
> across steps they don't know about.  We've been trying to understand the
> right set of recommendations to give around that issue which is most of the
> reason we probably don't have docs developed yet.  If you'd like to
> elaborate as you offered, we'd be interested in hearing about your issues.
>
> > The Test Suite is awesome!
>
> That is excellent to hear.  Not many people have to interact with the test
> suite directly but it is super critical part of the TinkerPop Ecosystem -
> if those who have to use is aren't satisfied with it, I'd consider that a
> big problem.
>
> > Just a thought, it would be great if failing tests would print some kind
> of "DEBUG" logs from the steps (or something like the profile step's
> output), so it's easier to figure out what step isn't working properly and
> why .
>
> Still trying to figure that out (i.e. what's the most useful way to "DEBUG"
> things).  We don't do logging in gremlin-core so there isn't much to output
> there.  I'm hoping that this ticket will be useful in this area:
>
> https://issues.apache.org/jira/browse/TINKERPOP3-679
>
> I did give a look at your implementation code.  I noticed that you only had
> to @OptOut of a couple of tests - not bad, though I'm not sure how much of
> the test suite fires under your ElasticFeatures implementation.  We tried
> to write tests to allow maximum coverage given the most common feature set
> - hopefully you receive good coverage under that model.  Can you share what
> percentage of the tests fire for you given ElasticFeatures?
>
> Speaking of ElasticFeatures, you might want to make this a static
> reference:
>
>
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
>
> and try to generally reduce anonymous object creation within
> ElasticFeatures itself.  You don't want to create a new instance of that
> stuff for every feature check - we do a internal feature checking in
> different part of the stack and it could create a lot
> of unnecessary objects for you.
>
>
>
>
> On Mon, May 18, 2015 at 5:13 PM, Ran Magen <[email protected]> wrote:
>
> > Hey Stephen,
> >
> > ElasticGraph can be seen as an alternative to Titan - a big scaled-out
> > graph with indices (currentlywe we only have OLTP, but will add OLAP
> soon).
> > We're a company that started out a project using Titan, but it lacked
> some
> > capabilities we needed:
> >
> >    - Speed, especially with regards to using text/number/geo indices. Our
> >    benchmarks showed that ES could function much faster than the
> > performance
> >    we were getting from Titan.
> >    - Partitioning the data - useful for optimizing indexed queries on ES
> >    (Titan also uses ES, but doesn't include these optimizations). Plus,
> it
> >    allows you to manage the data for your specific needs. For example if
> > you
> >    have a graph with real-time events coming in, and you want to
> > periodically
> >    delete all the old events, you can partition the data by time.
> >    - The spatial capabilities didn't support all the features we needed.
> >    - Titan's future was in question
> >    <
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > >
> >    .
> >    - And a bunch of other small issues.
> >
> > We thought about contributing to Titan to add these capabilites, but
> > Titan's architecture (which separates the indexing backend from the
> "main"
> > store) made it difficult. Plus Titan has a big codebase supporting many
> > different BEs. At the end we figured it would just be simpler to implenet
> > TP directly on ES. It also sparse us from maintaining an extra
> > hbase/cassandra cluster.
> > We figured more people might have stumbled across these issues, so we're
> > sharing the code.
> >
> > Numbers - we've gotten up to a few billions at this point in our tests,
> but
> > I'm pretty confident on its ability to scale further.
> >
> > As for developing for TP, it's been mostly great :) The architecture is
> > very powerful, and gremlin 3 is turning out to be a great querying
> > language. And most importantly, it's fast to implement it.
> > The biggest issue I had was implementing custom steps. Apart from
> GraphStep
> > (which has a simple example in TinkerGraph), the other steps are pretty
> > hard to figure out. For example we implemented a VertexStep that batches
> up
> > traversers and their has steps to query them together, and had many
> issues
> > (I can elaborate if you want). We actually still have a pretty big issue
> > I'll raise in another thread.
> >
> > The Test Suite is awesome! It would be practically impossible to
> implement
> > TP so fast and easily without it. Just a thought, it would be great if
> > failing tests would print some kind of "DEBUG" logs from the steps (or
> > something like the profile step's output), so it's easier to figure out
> > what step isn't working properly and why .
> >
> >
> >
> > On Mon, 18 May 2015 at 21:23 Stephen Mallette <[email protected]>
> > wrote:
> >
> > > Thanks for sharing your project. Looks like you've implemented both the
> > > structure and process suites in ElasticGraph up to the latest M9
> release
> > > candidate - very nice.
> > >
> > > Where would you say that this implementation fits?  Are there specific
> > uses
> > > cases where you would want to use ElasticGraph over other
> > implementations?
> > > When you say that "we're already using it with very big graphs" can you
> > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > >
> > > Finally, more specifically related to TinkerPop, did you encounter any
> > > challenges in implementing the APIs or the Test Suite itself?
> > >
> > >
> > >
> > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <[email protected]> wrote:
> > >
> > > > Hey guys,
> > > > Just wanted to let you know about a TP3 implementation we're working
> > on.
> > > > It's based on elastic-search, enabling very good scalability and
> > indexing
> > > > capabilities.
> > > > You can find the code here <
> https://github.com/rmagen/elastic-gremlin
> > >.
> > > >
> > > > This is still very much a work in progress (still more features and
> > > > optimizations planned, and some bugs to fix), but we're already using
> > it
> > > > with very big graphs.
> > > >
> > > > I would appreciate any feedback!
> > > > Cheers,
> > > >
> > >
> >
>

Re: elastic-gremlin

Reply via email to