GitHub user okram opened a pull request: https://github.com/apache/incubator-tinkerpop/pull/264
TINKERPOP-1209 & TINKERPOP-1210: OrderXXXStep Updates https://issues.apache.org/jira/browse/TINKERPOP-1209 https://issues.apache.org/jira/browse/TINKERPOP-1210 In OLAP, if you have a pattern like `order()..limit(x)`, `OrderLimitStrategy` will make it so that each partitioned order (split across workers) orders and limits prior to merging. This greatly reduces the amount of data reaching the master traversal as `order()...limit()` is a common traversal pattern in OLAP. CHANGELOG ``` * Fixed an hash code bug in `OrderGlobalStep` and `OrderLocalStep`. * Added `OrderLimitStrategy` which will ensure that partitions are limited before being merged in OLAP. * `ComparatorHolder` now separates the traversal from the comparator. (*breaking*) ``` UPGRADE ``` ComparatorHolder API Change ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Providers that either have their own `ComparatorHolder` implementation or reason on `OrderXXXStep` will need to update their code. `ComparatorHolder` now returns `List<Pair<Traversal,Comparator>>`. This has greatly reduced the complexity of comparison-based steps like `OrderXXXStep`. However, its a breaking API change that is trivial to update to, just some awareness is required. ``` VOTE +1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1209 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tinkerpop/pull/264.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #264 ---- commit 813ba24da510b3770988d62916c2016265323129 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-15T16:10:26Z ComparatorHolder now has getComprators() return a List of Pair<Traversal,Comparator>. This was what we needed and this allowed me to gut lots of code and its so much more intutitive and will make it so pre-order/limits will be possible in OLAP. Unfortunately, this is breaking for vendors that reason (or have) a OrderXXXStep. The update is trivial and in fact, a lot less if/else for them. commit 3ee1548a2b9c0b1c1e00d13a024a388965f5e846 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-15T16:12:59Z TraversalComparator is gutted -- check out that if/else nest that we no longer have to propagate through. Thank the heavens. commit a67567e0ce627a5fa313d4be0a0f5b9f6ad65584 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-15T17:57:54Z added OrderLimitStrategy which finds order()...limit(x) patterns. It then tells OrderStep to order-then-limit. This is a potentially massive optimization in OLAP where if you do order().limit(5), the max number of traversers coming to the master traversal, is 5 * numberOfWorkers instead of the full set of traversers. Added OrderBiOperator which is a Memory reducer which handles this in OLAP. Added test cases to make this pretty. Added this as a default strategy in the GlobalCache. Currently OrderLimitStrategy is only for OLAP -- we could make it for OLTP, but we would have to write our own custom Collections.sort() that has a size limit. commit dc0348717115a3572b5b70ef1d8f969c505c2bbf Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2016-03-15T20:18:43Z added more test cases. Fixed a old equality issue in OrderGlobalStepTest and OrderLocalStepTest cc/ @dkuppitz. Added more test cases to ensure OrderLimitStrategy is behaving correctly. OrderBiOperator now uses JavaSerializer so Giraph and Spark are happy. I think this is good to go. Perhaps one more test case using GratefulDead graph would be good. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---