Norio Akagi created TINKERPOP-2980: -------------------------------------- Summary: LocalStep's "object-local" behavior is not clearly described in the doc Key: TINKERPOP-2980 URL: https://issues.apache.org/jira/browse/TINKERPOP-2980 Project: TinkerPop Issue Type: Improvement Components: documentation Affects Versions: 3.6.5 Reporter: Norio Akagi
LocalStep is supposed to handle solutions locally, but what it actually does is unclear from the documentation. What LocalStep actually does is, * it just processes {{TraverserSet}} as it is (they are kept bulked, without being split). * So when there are same elements in the previous Step's output and as long as they are bulked into a {{{}TraverserSet{}}}, it is processed in "object-local" manner. * How {{TraverserSet}} is bulked ? It relies upon {{LazyBarrierStrategy}} which inserts {{{}noOpBarrierStep{}}}, that handles the bulking. * Or we can explicitly add {{barrier()}} step to make the bulking happen This creates some discrepancies that users may not easily see. As an illustration, this is the regular "object-local" behavior. {noformat} gremlin> g.V().in().out() ==>v[3] ==>v[3] ==>v[3] ==>v[2] ==>v[2] ==>v[2] ==>v[4] ==>v[4] ==>v[4] ==>v[5] ==>v[5] ==>v[3] ==>v[3] ==>v[3] gremlin> g.V().in().out().local(count()) ==>6 ==>3 ==>3 ==>2{noformat} You can see that the same objects (vertices) are processed locally. However, there is a case that it does not work in the way. For example, you can disable the Strategy {noformat} gremlin> g.withoutStrategies(LazyBarrierStrategy.class).V().in().out().local(count()) ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1 ==>1{noformat} then we are seeing "solution-local" behavior, each single solution processed locally. Likewise, there is a case that LazyBarrierStrategy does not kick in. {noformat} gremlin> g.V(1,1,1).local(count()) ==>1 ==>1 ==>1{noformat} It relies upon {{LazyBarrierStrategy}} but it would not be apparent to users. Furthermore, GraphProviders have freedom to drop any TinkerPop's strategies, so if {{LazyBarrierStrategy}} is dropped, local always works in solution-local manner. There is a description in the doc that users may use {{map}} or {{{}flatMap{}}}. This can work, but many users may already be using local for "solution-local" without noticing. Also there are subtle differences among them. (1) {{map}} only emits one solution per each incoming input, while working in solution-local {noformat} gremlin> g.V().map(out()).path() ==>[v[1],v[3]] ==>[v[4],v[5]] ==>[v[6],v[3]]{noformat} (2) {{flatMap }}can stream all solutions and solution-local, but only leaves the last element in Path unlike {{local}} {noformat} gremlin> g.V().flatMap(out().out()).path() ==>[v[1],v[5]] ==>[v[1],v[3]]{noformat} This {{flatMap's}} behavior is not documented but there are use-cases that users intentionally use flatMap for this feature. So while in the documentation we recommend use these 2 instead of local, in some case it's not easy to migrate. At this point, I think * We should clarify in the doc that ** What {{barrier() / noOpBarrierStep}} does and how it makes impact on {{local()}} ** How{{ LazyBarrierStrategy }}is related to {{barrier() / noOpBarrierStep}} ** what is different between {{{}map{}}}, {{flatMap}} and {{{}local{}}}, including Path handling and instead of describing local() as internal use when implementing Strategy, we should tell users to use it whenever they understand how it works and what they are doing with local(). -- This message was sent by Atlassian Jira (v8.20.10#820010)