Re: elastic-gremlin

Stephen Mallette Tue, 26 May 2015 04:40:48 -0700

I've not had a chance to think about it, but I now see the issue you
opened.  It was probably good that you added that for tracking:


https://issues.apache.org/jira/browse/TINKERPOP3-701

On Sat, May 23, 2015 at 4:25 PM, Ran Magen <[email protected]> wrote:

> >i may have messed up the Mutating interface design a bit.  looking at it
> now, i feel like it could be less coupled to the EventStrategy related
> features.  I'll take a look at it to see if I can make it "better" before
> GA.  I don't think my changes should affect vendors or the test suites, so
> if it turns out to be that way i'll give it a shot.
>
> Any progress? Should I open a ticket for this?
>
> On Wed, 20 May 2015 at 22:17 Stephen Mallette <[email protected]>
> wrote:
>
> > >  I guess today these features don't work because the Suite classes
> > initialize the tests
> >
> > right - because we have the custom test suites the tests are determine
> more
> > dynamically so your ability to right-click/run is kinda lost. :/
> >
> > On Wed, May 20, 2015 at 2:47 PM, Ran Magen <[email protected]> wrote:
> >
> > > >I don't have a better idea than the environment variable.  you should
> be
> > > able to use the debugger though.  works for me in intellij when i've
> > looked
> > > at a problem in titan.  i'm not sure if it only works because i have
> the
> > > tinkerpop source on my system, but i can step through tinkerpop source
> > > and titan source interchangeably.  i don't think i did anything
> specific
> > > to enable that.
> > >
> > > I wasn't clear. I use intellij, and it has simple shortcuts to run
> tests:
> > > right clicking on a test method/class and clicking run, rerunning only
> > > failed tests, etc. This could really help cases where I need to debug a
> > > test, and put a breakpoint somewhere in the code. If other tests run
> > > before, the breakpoints will usually get hit lots of times. I guess
> today
> > > these features don't work because the Suite classes initialize the
> > tests. I
> > > don't know enough about jUnit to offer solutions, thought you might
> have.
> > >
> > > >perhaps you could provide links to relevant code.  i'm sorry to say
> that
> > > most times the answer to this kind of stuff isn't obvious.
> > >
> > > Okay, Ill get some example code.
> > >
> > > >i may have messed up the Mutating interface design a bit. looking at
> > > it now, i feel like it could be less coupled to the EventStrategy
> related
> > > features.  I'll take a look at it to see if I can make it "better"
> before
> > > GA.
> > >
> > > Great that would be a big help!
> > >
> > > >we don't have much on bulk insertion in the API. perhaps you should
> > create
> > > an issue for discussion
> > >
> > > https://issues.apache.org/jira/browse/TINKERPOP3-694
> > >
> > >
> > > Thanks again for all the help
> > >
> > > On Wed, 20 May 2015 at 19:53 Stephen Mallette <[email protected]>
> > > wrote:
> > >
> > > > >
> > > > > The Process coverage seems good. I believe most of the failures are
> > due
> > > > to
> > > > > the fact that I only support string IDs (I think not all tests call
> > the
> > > > > convertId method).
> > > >
> > > >
> > > > hmmm - thought we had rooted all of those out via work with pieter
> > martin
> > > > on sqlg.  please let me know which ones still aren't making those
> > calls.
> > > >
> > > >
> > > > > It would also be great if we could easily run specific tests or
> > classes
> > > > > using junit. at the moment its cumbersome to run a class of tests
> > > > > (updateing the environment variable each time), and impossible to
> > > debug a
> > > > > specific test easily (or at least I haven't found a way).
> > > > >
> > > >
> > > > I don't have a better idea than the environment variable.  you should
> > be
> > > > able to use the debugger though.  works for me in intellij when i've
> > > looked
> > > > at a problem in titan.  i'm not sure if it only works because i have
> > the
> > > > tinkerpop source on my system, but i can step through tinkerpop
> source
> > > and
> > > > titan source interchangeably.  i don't think i did anything specific
> to
> > > > enable that.
> > > >
> > > >
> > > > >    1. We made a custom VertexStep that aggregates traversers, and
> has
> > > > >    steps, to minimize the amount of queries issued. It messed up a
> > few
> > > > > things,
> > > > >    but we got the basic usage working in M9 (guess you fixed some
> > stuff
> > > > for
> > > > >    Titan, which do the same thing). The problem now is that it
> > doesn't
> > > > > work on
> > > > >    inner traversals. For example, Repeat gives out only 1 traverser
> > > every
> > > > >    time. Do you have any suggestions? Am I doing something wrong?
> > > > >
> > > >
> > > > perhaps you could provide links to relevant code.  i'm sorry to say
> > that
> > > > most times the answer to this kind of stuff isn't obvious.
> > > >
> > > >
> > > > >    2. We want to implement a validation strategy. Sort of like
> > > > >    EventStrategy, but it will notify before a mutation, and will
> > enable
> > > > the
> > > > >    user's validation code to cancel a mutation if it doesn't pass
> its
> > > > > checks.
> > > > >    The problem is that there are no "before" callbacks for the
> > Mutating
> > > > >    interface.
> > > > >
> > > >
> > > > i may have messed up the Mutating interface design a bit.  looking at
> > it
> > > > now, i feel like it could be less coupled to the EventStrategy
> related
> > > > features.  I'll take a look at it to see if I can make it "better"
> > before
> > > > GA.  I don't think my changes should affect vendors or the test
> suites,
> > > so
> > > > if it turns out to be that way i'll give it a shot.
> > > >
> > > >
> > > > >    3. Adding in bulk - we added our own functions for bulk inserts,
> > > since
> > > > >    we didn't find anything to support it in the API. The thing is
> we
> > > need
> > > > > this
> > > > >    ability as part of the traversal, so we can utilize the
> validation
> > > > > strategy
> > > > >    (if we can get that working). We thought about inheriting from
> the
> > > Add
> > > > >    steps, but they're final. It'd be great to have somting like
> > > > >    __.inject(vertices).as('x').addV('x'), and have the ability to
> > make
> > > it
> > > > > bulk
> > > > >    load the vertices.
> > > >
> > > >
> > > > we're trying to avoid problems with improper inheritance which messes
> > > with
> > > > traversal strategies - hence steps are typically "final".   we don't
> > have
> > > > much on bulk insertion in the API.  perhaps you should create an
> issue
> > > for
> > > > discussion.
> > > >
> > > > On Wed, May 20, 2015 at 11:08 AM, Ran Magen <[email protected]>
> wrote:
> > > >
> > > > > > percentage of the tests fire for you given ElasticFeatures?
> > > > >
> > > > > ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored,
> 320
> > > > > passed
> > > > > ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed,
> > 321
> > > > > ignored, 394 passed
> > > > > The Process coverage seems good. I believe most of the failures are
> > due
> > > > to
> > > > > the fact that I only support string IDs (I think not all tests call
> > the
> > > > > convertId method). And some new stuff in M9 that I haven't gotten
> > > around
> > > > to
> > > > > fixing yet. But I'll make sure and open tickets for anything I
> find.
> > > > > It would also be great if we could easily run specific tests or
> > classes
> > > > > using junit. at the moment its cumbersome to run a class of tests
> > > > > (updateing the environment variable each time), and impossible to
> > > debug a
> > > > > specific test easily (or at least I haven't found a way).
> > > > >
> > > > > > we'd be interested in hearing about your issues.
> > > > >
> > > > >    1. We made a custom VertexStep that aggregates traversers, and
> has
> > > > >    steps, to minimize the amount of queries issued. It messed up a
> > few
> > > > > things,
> > > > >    but we got the basic usage working in M9 (guess you fixed some
> > stuff
> > > > for
> > > > >    Titan, which do the same thing). The problem now is that it
> > doesn't
> > > > > work on
> > > > >    inner traversals. For example, Repeat gives out only 1 traverser
> > > every
> > > > >    time. Do you have any suggestions? Am I doing something wrong?
> > > > >    2. We want to implement a validation strategy. Sort of like
> > > > >    EventStrategy, but it will notify before a mutation, and will
> > enable
> > > > the
> > > > >    user's validation code to cancel a mutation if it doesn't pass
> its
> > > > > checks.
> > > > >    The problem is that there are no "before" callbacks for the
> > Mutating
> > > > >    interface. We also thought the strategy could just add a
> > validation
> > > > step
> > > > >    before each mutating step, but that had its own issues. Also,
> the
> > > > >    validation strategy won't work on stuff like graph.addVertex(),
> > but
> > > I
> > > > > guess
> > > > >    we can make sure people only use the traversal.
> > > > >    3. Adding in bulk - we added our own functions for bulk inserts,
> > > since
> > > > >    we didn't find anything to support it in the API. The thing is
> we
> > > need
> > > > > this
> > > > >    ability as part of the traversal, so we can utilize the
> validation
> > > > > strategy
> > > > >    (if we can get that working). We thought about inheriting from
> the
> > > Add
> > > > >    steps, but they're final. It'd be great to have somting like
> > > > >    __.inject(vertices).as('x').addV('x'), and have the ability to
> > make
> > > it
> > > > > bulk
> > > > >    load the vertices.
> > > > >
> > > > > Thank you for your help!
> > > > >
> > > > >
> > > > > On Tue, 19 May 2015 at 01:37 Stephen Mallette <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Thanks for sharing all that additional information.
> > > > > >
> > > > > > > The biggest issue I had was implementing custom steps.
> > > > > >
> > > > > > I think we have a bit of a hole in the docs around that kinda of
> > > stuff
> > > > at
> > > > > > the moment.  You have to be careful with custom steps because the
> > > > > > TraversalStrategy implementations might not behave nicely if they
> > > come
> > > > > > across steps they don't know about.  We've been trying to
> > understand
> > > > the
> > > > > > right set of recommendations to give around that issue which is
> > most
> > > of
> > > > > the
> > > > > > reason we probably don't have docs developed yet.  If you'd like
> to
> > > > > > elaborate as you offered, we'd be interested in hearing about
> your
> > > > > issues.
> > > > > >
> > > > > > > The Test Suite is awesome!
> > > > > >
> > > > > > That is excellent to hear.  Not many people have to interact with
> > the
> > > > > test
> > > > > > suite directly but it is super critical part of the TinkerPop
> > > > Ecosystem -
> > > > > > if those who have to use is aren't satisfied with it, I'd
> consider
> > > > that a
> > > > > > big problem.
> > > > > >
> > > > > > > Just a thought, it would be great if failing tests would print
> > some
> > > > > kind
> > > > > > of "DEBUG" logs from the steps (or something like the profile
> > step's
> > > > > > output), so it's easier to figure out what step isn't working
> > > properly
> > > > > and
> > > > > > why .
> > > > > >
> > > > > > Still trying to figure that out (i.e. what's the most useful way
> to
> > > > > "DEBUG"
> > > > > > things).  We don't do logging in gremlin-core so there isn't much
> > to
> > > > > output
> > > > > > there.  I'm hoping that this ticket will be useful in this area:
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/TINKERPOP3-679
> > > > > >
> > > > > > I did give a look at your implementation code.  I noticed that
> you
> > > only
> > > > > had
> > > > > > to @OptOut of a couple of tests - not bad, though I'm not sure
> how
> > > much
> > > > > of
> > > > > > the test suite fires under your ElasticFeatures implementation.
> We
> > > > tried
> > > > > > to write tests to allow maximum coverage given the most common
> > > feature
> > > > > set
> > > > > > - hopefully you receive good coverage under that model.  Can you
> > > share
> > > > > what
> > > > > > percentage of the tests fire for you given ElasticFeatures?
> > > > > >
> > > > > > Speaking of ElasticFeatures, you might want to make this a static
> > > > > > reference:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> > > > > >
> > > > > > and try to generally reduce anonymous object creation within
> > > > > > ElasticFeatures itself.  You don't want to create a new instance
> of
> > > > that
> > > > > > stuff for every feature check - we do a internal feature checking
> > in
> > > > > > different part of the stack and it could create a lot
> > > > > > of unnecessary objects for you.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <[email protected]>
> > wrote:
> > > > > >
> > > > > > > Hey Stephen,
> > > > > > >
> > > > > > > ElasticGraph can be seen as an alternative to Titan - a big
> > > > scaled-out
> > > > > > > graph with indices (currentlywe we only have OLTP, but will add
> > > OLAP
> > > > > > soon).
> > > > > > > We're a company that started out a project using Titan, but it
> > > lacked
> > > > > > some
> > > > > > > capabilities we needed:
> > > > > > >
> > > > > > >    - Speed, especially with regards to using text/number/geo
> > > indices.
> > > > > Our
> > > > > > >    benchmarks showed that ES could function much faster than
> the
> > > > > > > performance
> > > > > > >    we were getting from Titan.
> > > > > > >    - Partitioning the data - useful for optimizing indexed
> > queries
> > > on
> > > > > ES
> > > > > > >    (Titan also uses ES, but doesn't include these
> optimizations).
> > > > Plus,
> > > > > > it
> > > > > > >    allows you to manage the data for your specific needs. For
> > > example
> > > > > if
> > > > > > > you
> > > > > > >    have a graph with real-time events coming in, and you want
> to
> > > > > > > periodically
> > > > > > >    delete all the old events, you can partition the data by
> time.
> > > > > > >    - The spatial capabilities didn't support all the features
> we
> > > > > needed.
> > > > > > >    - Titan's future was in question
> > > > > > >    <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > > > > > >
> > > > > > >    .
> > > > > > >    - And a bunch of other small issues.
> > > > > > >
> > > > > > > We thought about contributing to Titan to add these
> capabilites,
> > > but
> > > > > > > Titan's architecture (which separates the indexing backend from
> > the
> > > > > > "main"
> > > > > > > store) made it difficult. Plus Titan has a big codebase
> > supporting
> > > > many
> > > > > > > different BEs. At the end we figured it would just be simpler
> to
> > > > > implenet
> > > > > > > TP directly on ES. It also sparse us from maintaining an extra
> > > > > > > hbase/cassandra cluster.
> > > > > > > We figured more people might have stumbled across these issues,
> > so
> > > > > we're
> > > > > > > sharing the code.
> > > > > > >
> > > > > > > Numbers - we've gotten up to a few billions at this point in
> our
> > > > tests,
> > > > > > but
> > > > > > > I'm pretty confident on its ability to scale further.
> > > > > > >
> > > > > > > As for developing for TP, it's been mostly great :) The
> > > architecture
> > > > is
> > > > > > > very powerful, and gremlin 3 is turning out to be a great
> > querying
> > > > > > > language. And most importantly, it's fast to implement it.
> > > > > > > The biggest issue I had was implementing custom steps. Apart
> from
> > > > > > GraphStep
> > > > > > > (which has a simple example in TinkerGraph), the other steps
> are
> > > > pretty
> > > > > > > hard to figure out. For example we implemented a VertexStep
> that
> > > > > batches
> > > > > > up
> > > > > > > traversers and their has steps to query them together, and had
> > many
> > > > > > issues
> > > > > > > (I can elaborate if you want). We actually still have a pretty
> > big
> > > > > issue
> > > > > > > I'll raise in another thread.
> > > > > > >
> > > > > > > The Test Suite is awesome! It would be practically impossible
> to
> > > > > > implement
> > > > > > > TP so fast and easily without it. Just a thought, it would be
> > great
> > > > if
> > > > > > > failing tests would print some kind of "DEBUG" logs from the
> > steps
> > > > (or
> > > > > > > something like the profile step's output), so it's easier to
> > figure
> > > > out
> > > > > > > what step isn't working properly and why .
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for sharing your project. Looks like you've
> implemented
> > > both
> > > > > the
> > > > > > > > structure and process suites in ElasticGraph up to the latest
> > M9
> > > > > > release
> > > > > > > > candidate - very nice.
> > > > > > > >
> > > > > > > > Where would you say that this implementation fits?  Are there
> > > > > specific
> > > > > > > uses
> > > > > > > > cases where you would want to use ElasticGraph over other
> > > > > > > implementations?
> > > > > > > > When you say that "we're already using it with very big
> graphs"
> > > can
> > > > > you
> > > > > > > > qualify that a bit (millions of edge, billions of edges,
> etc.)?
> > > > > > > >
> > > > > > > > Finally, more specifically related to TinkerPop, did you
> > > encounter
> > > > > any
> > > > > > > > challenges in implementing the APIs or the Test Suite itself?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <[email protected]
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hey guys,
> > > > > > > > > Just wanted to let you know about a TP3 implementation
> we're
> > > > > working
> > > > > > > on.
> > > > > > > > > It's based on elastic-search, enabling very good
> scalability
> > > and
> > > > > > > indexing
> > > > > > > > > capabilities.
> > > > > > > > > You can find the code here <
> > > > > > https://github.com/rmagen/elastic-gremlin
> > > > > > > >.
> > > > > > > > >
> > > > > > > > > This is still very much a work in progress (still more
> > features
> > > > and
> > > > > > > > > optimizations planned, and some bugs to fix), but we're
> > already
> > > > > using
> > > > > > > it
> > > > > > > > > with very big graphs.
> > > > > > > > >
> > > > > > > > > I would appreciate any feedback!
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Reply via email to