Re: elastic-gremlin

Stephen Mallette Wed, 20 May 2015 09:53:28 -0700

>
> The Process coverage seems good. I believe most of the failures are due to
> the fact that I only support string IDs (I think not all tests call the
> convertId method).



hmmm - thought we had rooted all of those out via work with pieter martin
on sqlg.  please let me know which ones still aren't making those calls.


> It would also be great if we could easily run specific tests or classes
> using junit. at the moment its cumbersome to run a class of tests
> (updateing the environment variable each time), and impossible to debug a
> specific test easily (or at least I haven't found a way).
>

I don't have a better idea than the environment variable.  you should be
able to use the debugger though.  works for me in intellij when i've looked
at a problem in titan.  i'm not sure if it only works because i have the
tinkerpop source on my system, but i can step through tinkerpop source and
titan source interchangeably.  i don't think i did anything specific to
enable that.


>    1. We made a custom VertexStep that aggregates traversers, and has
>    steps, to minimize the amount of queries issued. It messed up a few
> things,
>    but we got the basic usage working in M9 (guess you fixed some stuff for
>    Titan, which do the same thing). The problem now is that it doesn't
> work on
>    inner traversals. For example, Repeat gives out only 1 traverser every
>    time. Do you have any suggestions? Am I doing something wrong?
>

perhaps you could provide links to relevant code.  i'm sorry to say that
most times the answer to this kind of stuff isn't obvious.


>    2. We want to implement a validation strategy. Sort of like
>    EventStrategy, but it will notify before a mutation, and will enable the
>    user's validation code to cancel a mutation if it doesn't pass its
> checks.
>    The problem is that there are no "before" callbacks for the Mutating
>    interface.
>

i may have messed up the Mutating interface design a bit.  looking at it
now, i feel like it could be less coupled to the EventStrategy related
features.  I'll take a look at it to see if I can make it "better" before
GA.  I don't think my changes should affect vendors or the test suites, so
if it turns out to be that way i'll give it a shot.


>    3. Adding in bulk - we added our own functions for bulk inserts, since
>    we didn't find anything to support it in the API. The thing is we need
> this
>    ability as part of the traversal, so we can utilize the validation
> strategy
>    (if we can get that working). We thought about inheriting from the Add
>    steps, but they're final. It'd be great to have somting like
>    __.inject(vertices).as('x').addV('x'), and have the ability to make it
> bulk
>    load the vertices.


we're trying to avoid problems with improper inheritance which messes with
traversal strategies - hence steps are typically "final".   we don't have
much on bulk insertion in the API.  perhaps you should create an issue for
discussion.

On Wed, May 20, 2015 at 11:08 AM, Ran Magen <[email protected]> wrote:

> > percentage of the tests fire for you given ElasticFeatures?
>
> ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320
> passed
> ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, 321
> ignored, 394 passed
> The Process coverage seems good. I believe most of the failures are due to
> the fact that I only support string IDs (I think not all tests call the
> convertId method). And some new stuff in M9 that I haven't gotten around to
> fixing yet. But I'll make sure and open tickets for anything I find.
> It would also be great if we could easily run specific tests or classes
> using junit. at the moment its cumbersome to run a class of tests
> (updateing the environment variable each time), and impossible to debug a
> specific test easily (or at least I haven't found a way).
>
> > we'd be interested in hearing about your issues.
>
>    1. We made a custom VertexStep that aggregates traversers, and has
>    steps, to minimize the amount of queries issued. It messed up a few
> things,
>    but we got the basic usage working in M9 (guess you fixed some stuff for
>    Titan, which do the same thing). The problem now is that it doesn't
> work on
>    inner traversals. For example, Repeat gives out only 1 traverser every
>    time. Do you have any suggestions? Am I doing something wrong?
>    2. We want to implement a validation strategy. Sort of like
>    EventStrategy, but it will notify before a mutation, and will enable the
>    user's validation code to cancel a mutation if it doesn't pass its
> checks.
>    The problem is that there are no "before" callbacks for the Mutating
>    interface. We also thought the strategy could just add a validation step
>    before each mutating step, but that had its own issues. Also, the
>    validation strategy won't work on stuff like graph.addVertex(), but I
> guess
>    we can make sure people only use the traversal.
>    3. Adding in bulk - we added our own functions for bulk inserts, since
>    we didn't find anything to support it in the API. The thing is we need
> this
>    ability as part of the traversal, so we can utilize the validation
> strategy
>    (if we can get that working). We thought about inheriting from the Add
>    steps, but they're final. It'd be great to have somting like
>    __.inject(vertices).as('x').addV('x'), and have the ability to make it
> bulk
>    load the vertices.
>
> Thank you for your help!
>
>
> On Tue, 19 May 2015 at 01:37 Stephen Mallette <[email protected]>
> wrote:
>
> > Thanks for sharing all that additional information.
> >
> > > The biggest issue I had was implementing custom steps.
> >
> > I think we have a bit of a hole in the docs around that kinda of stuff at
> > the moment.  You have to be careful with custom steps because the
> > TraversalStrategy implementations might not behave nicely if they come
> > across steps they don't know about.  We've been trying to understand the
> > right set of recommendations to give around that issue which is most of
> the
> > reason we probably don't have docs developed yet.  If you'd like to
> > elaborate as you offered, we'd be interested in hearing about your
> issues.
> >
> > > The Test Suite is awesome!
> >
> > That is excellent to hear.  Not many people have to interact with the
> test
> > suite directly but it is super critical part of the TinkerPop Ecosystem -
> > if those who have to use is aren't satisfied with it, I'd consider that a
> > big problem.
> >
> > > Just a thought, it would be great if failing tests would print some
> kind
> > of "DEBUG" logs from the steps (or something like the profile step's
> > output), so it's easier to figure out what step isn't working properly
> and
> > why .
> >
> > Still trying to figure that out (i.e. what's the most useful way to
> "DEBUG"
> > things).  We don't do logging in gremlin-core so there isn't much to
> output
> > there.  I'm hoping that this ticket will be useful in this area:
> >
> > https://issues.apache.org/jira/browse/TINKERPOP3-679
> >
> > I did give a look at your implementation code.  I noticed that you only
> had
> > to @OptOut of a couple of tests - not bad, though I'm not sure how much
> of
> > the test suite fires under your ElasticFeatures implementation.  We tried
> > to write tests to allow maximum coverage given the most common feature
> set
> > - hopefully you receive good coverage under that model.  Can you share
> what
> > percentage of the tests fire for you given ElasticFeatures?
> >
> > Speaking of ElasticFeatures, you might want to make this a static
> > reference:
> >
> >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> >
> > and try to generally reduce anonymous object creation within
> > ElasticFeatures itself.  You don't want to create a new instance of that
> > stuff for every feature check - we do a internal feature checking in
> > different part of the stack and it could create a lot
> > of unnecessary objects for you.
> >
> >
> >
> >
> > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <[email protected]> wrote:
> >
> > > Hey Stephen,
> > >
> > > ElasticGraph can be seen as an alternative to Titan - a big scaled-out
> > > graph with indices (currentlywe we only have OLTP, but will add OLAP
> > soon).
> > > We're a company that started out a project using Titan, but it lacked
> > some
> > > capabilities we needed:
> > >
> > >    - Speed, especially with regards to using text/number/geo indices.
> Our
> > >    benchmarks showed that ES could function much faster than the
> > > performance
> > >    we were getting from Titan.
> > >    - Partitioning the data - useful for optimizing indexed queries on
> ES
> > >    (Titan also uses ES, but doesn't include these optimizations). Plus,
> > it
> > >    allows you to manage the data for your specific needs. For example
> if
> > > you
> > >    have a graph with real-time events coming in, and you want to
> > > periodically
> > >    delete all the old events, you can partition the data by time.
> > >    - The spatial capabilities didn't support all the features we
> needed.
> > >    - Titan's future was in question
> > >    <
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > >
> > >    .
> > >    - And a bunch of other small issues.
> > >
> > > We thought about contributing to Titan to add these capabilites, but
> > > Titan's architecture (which separates the indexing backend from the
> > "main"
> > > store) made it difficult. Plus Titan has a big codebase supporting many
> > > different BEs. At the end we figured it would just be simpler to
> implenet
> > > TP directly on ES. It also sparse us from maintaining an extra
> > > hbase/cassandra cluster.
> > > We figured more people might have stumbled across these issues, so
> we're
> > > sharing the code.
> > >
> > > Numbers - we've gotten up to a few billions at this point in our tests,
> > but
> > > I'm pretty confident on its ability to scale further.
> > >
> > > As for developing for TP, it's been mostly great :) The architecture is
> > > very powerful, and gremlin 3 is turning out to be a great querying
> > > language. And most importantly, it's fast to implement it.
> > > The biggest issue I had was implementing custom steps. Apart from
> > GraphStep
> > > (which has a simple example in TinkerGraph), the other steps are pretty
> > > hard to figure out. For example we implemented a VertexStep that
> batches
> > up
> > > traversers and their has steps to query them together, and had many
> > issues
> > > (I can elaborate if you want). We actually still have a pretty big
> issue
> > > I'll raise in another thread.
> > >
> > > The Test Suite is awesome! It would be practically impossible to
> > implement
> > > TP so fast and easily without it. Just a thought, it would be great if
> > > failing tests would print some kind of "DEBUG" logs from the steps (or
> > > something like the profile step's output), so it's easier to figure out
> > > what step isn't working properly and why .
> > >
> > >
> > >
> > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <[email protected]>
> > > wrote:
> > >
> > > > Thanks for sharing your project. Looks like you've implemented both
> the
> > > > structure and process suites in ElasticGraph up to the latest M9
> > release
> > > > candidate - very nice.
> > > >
> > > > Where would you say that this implementation fits?  Are there
> specific
> > > uses
> > > > cases where you would want to use ElasticGraph over other
> > > implementations?
> > > > When you say that "we're already using it with very big graphs" can
> you
> > > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > > >
> > > > Finally, more specifically related to TinkerPop, did you encounter
> any
> > > > challenges in implementing the APIs or the Test Suite itself?
> > > >
> > > >
> > > >
> > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <[email protected]> wrote:
> > > >
> > > > > Hey guys,
> > > > > Just wanted to let you know about a TP3 implementation we're
> working
> > > on.
> > > > > It's based on elastic-search, enabling very good scalability and
> > > indexing
> > > > > capabilities.
> > > > > You can find the code here <
> > https://github.com/rmagen/elastic-gremlin
> > > >.
> > > > >
> > > > > This is still very much a work in progress (still more features and
> > > > > optimizations planned, and some bugs to fix), but we're already
> using
> > > it
> > > > > with very big graphs.
> > > > >
> > > > > I would appreciate any feedback!
> > > > > Cheers,
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Reply via email to