I've not had a chance to think about it, but I now see the issue you opened. It was probably good that you added that for tracking:
https://issues.apache.org/jira/browse/TINKERPOP3-701 On Sat, May 23, 2015 at 4:25 PM, Ran Magen <[email protected]> wrote: > >i may have messed up the Mutating interface design a bit. looking at it > now, i feel like it could be less coupled to the EventStrategy related > features. I'll take a look at it to see if I can make it "better" before > GA. I don't think my changes should affect vendors or the test suites, so > if it turns out to be that way i'll give it a shot. > > Any progress? Should I open a ticket for this? > > On Wed, 20 May 2015 at 22:17 Stephen Mallette <[email protected]> > wrote: > > > > I guess today these features don't work because the Suite classes > > initialize the tests > > > > right - because we have the custom test suites the tests are determine > more > > dynamically so your ability to right-click/run is kinda lost. :/ > > > > On Wed, May 20, 2015 at 2:47 PM, Ran Magen <[email protected]> wrote: > > > > > >I don't have a better idea than the environment variable. you should > be > > > able to use the debugger though. works for me in intellij when i've > > looked > > > at a problem in titan. i'm not sure if it only works because i have > the > > > tinkerpop source on my system, but i can step through tinkerpop source > > > and titan source interchangeably. i don't think i did anything > specific > > > to enable that. > > > > > > I wasn't clear. I use intellij, and it has simple shortcuts to run > tests: > > > right clicking on a test method/class and clicking run, rerunning only > > > failed tests, etc. This could really help cases where I need to debug a > > > test, and put a breakpoint somewhere in the code. If other tests run > > > before, the breakpoints will usually get hit lots of times. I guess > today > > > these features don't work because the Suite classes initialize the > > tests. I > > > don't know enough about jUnit to offer solutions, thought you might > have. > > > > > > >perhaps you could provide links to relevant code. i'm sorry to say > that > > > most times the answer to this kind of stuff isn't obvious. > > > > > > Okay, Ill get some example code. > > > > > > >i may have messed up the Mutating interface design a bit. looking at > > > it now, i feel like it could be less coupled to the EventStrategy > related > > > features. I'll take a look at it to see if I can make it "better" > before > > > GA. > > > > > > Great that would be a big help! > > > > > > >we don't have much on bulk insertion in the API. perhaps you should > > create > > > an issue for discussion > > > > > > https://issues.apache.org/jira/browse/TINKERPOP3-694 > > > > > > > > > Thanks again for all the help > > > > > > On Wed, 20 May 2015 at 19:53 Stephen Mallette <[email protected]> > > > wrote: > > > > > > > > > > > > > The Process coverage seems good. I believe most of the failures are > > due > > > > to > > > > > the fact that I only support string IDs (I think not all tests call > > the > > > > > convertId method). > > > > > > > > > > > > hmmm - thought we had rooted all of those out via work with pieter > > martin > > > > on sqlg. please let me know which ones still aren't making those > > calls. > > > > > > > > > > > > > It would also be great if we could easily run specific tests or > > classes > > > > > using junit. at the moment its cumbersome to run a class of tests > > > > > (updateing the environment variable each time), and impossible to > > > debug a > > > > > specific test easily (or at least I haven't found a way). > > > > > > > > > > > > > I don't have a better idea than the environment variable. you should > > be > > > > able to use the debugger though. works for me in intellij when i've > > > looked > > > > at a problem in titan. i'm not sure if it only works because i have > > the > > > > tinkerpop source on my system, but i can step through tinkerpop > source > > > and > > > > titan source interchangeably. i don't think i did anything specific > to > > > > enable that. > > > > > > > > > > > > > 1. We made a custom VertexStep that aggregates traversers, and > has > > > > > steps, to minimize the amount of queries issued. It messed up a > > few > > > > > things, > > > > > but we got the basic usage working in M9 (guess you fixed some > > stuff > > > > for > > > > > Titan, which do the same thing). The problem now is that it > > doesn't > > > > > work on > > > > > inner traversals. For example, Repeat gives out only 1 traverser > > > every > > > > > time. Do you have any suggestions? Am I doing something wrong? > > > > > > > > > > > > > perhaps you could provide links to relevant code. i'm sorry to say > > that > > > > most times the answer to this kind of stuff isn't obvious. > > > > > > > > > > > > > 2. We want to implement a validation strategy. Sort of like > > > > > EventStrategy, but it will notify before a mutation, and will > > enable > > > > the > > > > > user's validation code to cancel a mutation if it doesn't pass > its > > > > > checks. > > > > > The problem is that there are no "before" callbacks for the > > Mutating > > > > > interface. > > > > > > > > > > > > > i may have messed up the Mutating interface design a bit. looking at > > it > > > > now, i feel like it could be less coupled to the EventStrategy > related > > > > features. I'll take a look at it to see if I can make it "better" > > before > > > > GA. I don't think my changes should affect vendors or the test > suites, > > > so > > > > if it turns out to be that way i'll give it a shot. > > > > > > > > > > > > > 3. Adding in bulk - we added our own functions for bulk inserts, > > > since > > > > > we didn't find anything to support it in the API. The thing is > we > > > need > > > > > this > > > > > ability as part of the traversal, so we can utilize the > validation > > > > > strategy > > > > > (if we can get that working). We thought about inheriting from > the > > > Add > > > > > steps, but they're final. It'd be great to have somting like > > > > > __.inject(vertices).as('x').addV('x'), and have the ability to > > make > > > it > > > > > bulk > > > > > load the vertices. > > > > > > > > > > > > we're trying to avoid problems with improper inheritance which messes > > > with > > > > traversal strategies - hence steps are typically "final". we don't > > have > > > > much on bulk insertion in the API. perhaps you should create an > issue > > > for > > > > discussion. > > > > > > > > On Wed, May 20, 2015 at 11:08 AM, Ran Magen <[email protected]> > wrote: > > > > > > > > > > percentage of the tests fire for you given ElasticFeatures? > > > > > > > > > > ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, > 320 > > > > > passed > > > > > ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, > > 321 > > > > > ignored, 394 passed > > > > > The Process coverage seems good. I believe most of the failures are > > due > > > > to > > > > > the fact that I only support string IDs (I think not all tests call > > the > > > > > convertId method). And some new stuff in M9 that I haven't gotten > > > around > > > > to > > > > > fixing yet. But I'll make sure and open tickets for anything I > find. > > > > > It would also be great if we could easily run specific tests or > > classes > > > > > using junit. at the moment its cumbersome to run a class of tests > > > > > (updateing the environment variable each time), and impossible to > > > debug a > > > > > specific test easily (or at least I haven't found a way). > > > > > > > > > > > we'd be interested in hearing about your issues. > > > > > > > > > > 1. We made a custom VertexStep that aggregates traversers, and > has > > > > > steps, to minimize the amount of queries issued. It messed up a > > few > > > > > things, > > > > > but we got the basic usage working in M9 (guess you fixed some > > stuff > > > > for > > > > > Titan, which do the same thing). The problem now is that it > > doesn't > > > > > work on > > > > > inner traversals. For example, Repeat gives out only 1 traverser > > > every > > > > > time. Do you have any suggestions? Am I doing something wrong? > > > > > 2. We want to implement a validation strategy. Sort of like > > > > > EventStrategy, but it will notify before a mutation, and will > > enable > > > > the > > > > > user's validation code to cancel a mutation if it doesn't pass > its > > > > > checks. > > > > > The problem is that there are no "before" callbacks for the > > Mutating > > > > > interface. We also thought the strategy could just add a > > validation > > > > step > > > > > before each mutating step, but that had its own issues. Also, > the > > > > > validation strategy won't work on stuff like graph.addVertex(), > > but > > > I > > > > > guess > > > > > we can make sure people only use the traversal. > > > > > 3. Adding in bulk - we added our own functions for bulk inserts, > > > since > > > > > we didn't find anything to support it in the API. The thing is > we > > > need > > > > > this > > > > > ability as part of the traversal, so we can utilize the > validation > > > > > strategy > > > > > (if we can get that working). We thought about inheriting from > the > > > Add > > > > > steps, but they're final. It'd be great to have somting like > > > > > __.inject(vertices).as('x').addV('x'), and have the ability to > > make > > > it > > > > > bulk > > > > > load the vertices. > > > > > > > > > > Thank you for your help! > > > > > > > > > > > > > > > On Tue, 19 May 2015 at 01:37 Stephen Mallette < > [email protected]> > > > > > wrote: > > > > > > > > > > > Thanks for sharing all that additional information. > > > > > > > > > > > > > The biggest issue I had was implementing custom steps. > > > > > > > > > > > > I think we have a bit of a hole in the docs around that kinda of > > > stuff > > > > at > > > > > > the moment. You have to be careful with custom steps because the > > > > > > TraversalStrategy implementations might not behave nicely if they > > > come > > > > > > across steps they don't know about. We've been trying to > > understand > > > > the > > > > > > right set of recommendations to give around that issue which is > > most > > > of > > > > > the > > > > > > reason we probably don't have docs developed yet. If you'd like > to > > > > > > elaborate as you offered, we'd be interested in hearing about > your > > > > > issues. > > > > > > > > > > > > > The Test Suite is awesome! > > > > > > > > > > > > That is excellent to hear. Not many people have to interact with > > the > > > > > test > > > > > > suite directly but it is super critical part of the TinkerPop > > > > Ecosystem - > > > > > > if those who have to use is aren't satisfied with it, I'd > consider > > > > that a > > > > > > big problem. > > > > > > > > > > > > > Just a thought, it would be great if failing tests would print > > some > > > > > kind > > > > > > of "DEBUG" logs from the steps (or something like the profile > > step's > > > > > > output), so it's easier to figure out what step isn't working > > > properly > > > > > and > > > > > > why . > > > > > > > > > > > > Still trying to figure that out (i.e. what's the most useful way > to > > > > > "DEBUG" > > > > > > things). We don't do logging in gremlin-core so there isn't much > > to > > > > > output > > > > > > there. I'm hoping that this ticket will be useful in this area: > > > > > > > > > > > > https://issues.apache.org/jira/browse/TINKERPOP3-679 > > > > > > > > > > > > I did give a look at your implementation code. I noticed that > you > > > only > > > > > had > > > > > > to @OptOut of a couple of tests - not bad, though I'm not sure > how > > > much > > > > > of > > > > > > the test suite fires under your ElasticFeatures implementation. > We > > > > tried > > > > > > to write tests to allow maximum coverage given the most common > > > feature > > > > > set > > > > > > - hopefully you receive good coverage under that model. Can you > > > share > > > > > what > > > > > > percentage of the tests fire for you given ElasticFeatures? > > > > > > > > > > > > Speaking of ElasticFeatures, you might want to make this a static > > > > > > reference: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68 > > > > > > > > > > > > and try to generally reduce anonymous object creation within > > > > > > ElasticFeatures itself. You don't want to create a new instance > of > > > > that > > > > > > stuff for every feature check - we do a internal feature checking > > in > > > > > > different part of the stack and it could create a lot > > > > > > of unnecessary objects for you. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <[email protected]> > > wrote: > > > > > > > > > > > > > Hey Stephen, > > > > > > > > > > > > > > ElasticGraph can be seen as an alternative to Titan - a big > > > > scaled-out > > > > > > > graph with indices (currentlywe we only have OLTP, but will add > > > OLAP > > > > > > soon). > > > > > > > We're a company that started out a project using Titan, but it > > > lacked > > > > > > some > > > > > > > capabilities we needed: > > > > > > > > > > > > > > - Speed, especially with regards to using text/number/geo > > > indices. > > > > > Our > > > > > > > benchmarks showed that ES could function much faster than > the > > > > > > > performance > > > > > > > we were getting from Titan. > > > > > > > - Partitioning the data - useful for optimizing indexed > > queries > > > on > > > > > ES > > > > > > > (Titan also uses ES, but doesn't include these > optimizations). > > > > Plus, > > > > > > it > > > > > > > allows you to manage the data for your specific needs. For > > > example > > > > > if > > > > > > > you > > > > > > > have a graph with real-time events coming in, and you want > to > > > > > > > periodically > > > > > > > delete all the old events, you can partition the data by > time. > > > > > > > - The spatial capabilities didn't support all the features > we > > > > > needed. > > > > > > > - Titan's future was in question > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/ > > > > > > > > > > > > > > > . > > > > > > > - And a bunch of other small issues. > > > > > > > > > > > > > > We thought about contributing to Titan to add these > capabilites, > > > but > > > > > > > Titan's architecture (which separates the indexing backend from > > the > > > > > > "main" > > > > > > > store) made it difficult. Plus Titan has a big codebase > > supporting > > > > many > > > > > > > different BEs. At the end we figured it would just be simpler > to > > > > > implenet > > > > > > > TP directly on ES. It also sparse us from maintaining an extra > > > > > > > hbase/cassandra cluster. > > > > > > > We figured more people might have stumbled across these issues, > > so > > > > > we're > > > > > > > sharing the code. > > > > > > > > > > > > > > Numbers - we've gotten up to a few billions at this point in > our > > > > tests, > > > > > > but > > > > > > > I'm pretty confident on its ability to scale further. > > > > > > > > > > > > > > As for developing for TP, it's been mostly great :) The > > > architecture > > > > is > > > > > > > very powerful, and gremlin 3 is turning out to be a great > > querying > > > > > > > language. And most importantly, it's fast to implement it. > > > > > > > The biggest issue I had was implementing custom steps. Apart > from > > > > > > GraphStep > > > > > > > (which has a simple example in TinkerGraph), the other steps > are > > > > pretty > > > > > > > hard to figure out. For example we implemented a VertexStep > that > > > > > batches > > > > > > up > > > > > > > traversers and their has steps to query them together, and had > > many > > > > > > issues > > > > > > > (I can elaborate if you want). We actually still have a pretty > > big > > > > > issue > > > > > > > I'll raise in another thread. > > > > > > > > > > > > > > The Test Suite is awesome! It would be practically impossible > to > > > > > > implement > > > > > > > TP so fast and easily without it. Just a thought, it would be > > great > > > > if > > > > > > > failing tests would print some kind of "DEBUG" logs from the > > steps > > > > (or > > > > > > > something like the profile step's output), so it's easier to > > figure > > > > out > > > > > > > what step isn't working properly and why . > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 18 May 2015 at 21:23 Stephen Mallette < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for sharing your project. Looks like you've > implemented > > > both > > > > > the > > > > > > > > structure and process suites in ElasticGraph up to the latest > > M9 > > > > > > release > > > > > > > > candidate - very nice. > > > > > > > > > > > > > > > > Where would you say that this implementation fits? Are there > > > > > specific > > > > > > > uses > > > > > > > > cases where you would want to use ElasticGraph over other > > > > > > > implementations? > > > > > > > > When you say that "we're already using it with very big > graphs" > > > can > > > > > you > > > > > > > > qualify that a bit (millions of edge, billions of edges, > etc.)? > > > > > > > > > > > > > > > > Finally, more specifically related to TinkerPop, did you > > > encounter > > > > > any > > > > > > > > challenges in implementing the APIs or the Test Suite itself? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > Hey guys, > > > > > > > > > Just wanted to let you know about a TP3 implementation > we're > > > > > working > > > > > > > on. > > > > > > > > > It's based on elastic-search, enabling very good > scalability > > > and > > > > > > > indexing > > > > > > > > > capabilities. > > > > > > > > > You can find the code here < > > > > > > https://github.com/rmagen/elastic-gremlin > > > > > > > >. > > > > > > > > > > > > > > > > > > This is still very much a work in progress (still more > > features > > > > and > > > > > > > > > optimizations planned, and some bugs to fix), but we're > > already > > > > > using > > > > > > > it > > > > > > > > > with very big graphs. > > > > > > > > > > > > > > > > > > I would appreciate any feedback! > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
