Norio Akagi created TINKERPOP-2980:
--------------------------------------

             Summary: LocalStep's "object-local" behavior is not clearly 
described in the doc
                 Key: TINKERPOP-2980
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2980
             Project: TinkerPop
          Issue Type: Improvement
          Components: documentation
    Affects Versions: 3.6.5
            Reporter: Norio Akagi


LocalStep is supposed to handle solutions locally, but what it actually does is 
unclear from the documentation.

What LocalStep actually does is,
 * it just processes {{TraverserSet}} as it is (they are kept bulked, without 
being split).
 * So when there are same elements in the previous Step's output and as long as 
they are bulked into a {{{}TraverserSet{}}}, it is processed in "object-local" 
manner.
 * How {{TraverserSet}} is bulked ? It relies upon {{LazyBarrierStrategy}} 
which inserts {{{}noOpBarrierStep{}}}, that handles the bulking.
 * Or we can explicitly add {{barrier()}} step to make the bulking happen

This creates some discrepancies that users may not easily see. As an 
illustration, this is the regular "object-local" behavior.
{noformat}
gremlin> g.V().in().out()
==>v[3]
==>v[3]
==>v[3]
==>v[2]
==>v[2]
==>v[2]
==>v[4]
==>v[4]
==>v[4]
==>v[5]
==>v[5]
==>v[3]
==>v[3]
==>v[3]

gremlin> g.V().in().out().local(count())
==>6
==>3
==>3
==>2{noformat}
You can see that the same objects (vertices) are processed locally. However, 
there is a case that it does not work in the way.

For example, you can disable the Strategy
{noformat}
gremlin> 
g.withoutStrategies(LazyBarrierStrategy.class).V().in().out().local(count())
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1
==>1{noformat}
then we are seeing "solution-local" behavior, each single solution processed 
locally. Likewise, there is a case that LazyBarrierStrategy does not kick in.
{noformat}
gremlin> g.V(1,1,1).local(count())
==>1
==>1
==>1{noformat}
It relies upon {{LazyBarrierStrategy}} but it would not be apparent to users. 
Furthermore, GraphProviders have freedom to drop any TinkerPop's strategies, so 
if {{LazyBarrierStrategy}} is dropped, local always works in solution-local 
manner.

There is a description in the doc that users may use {{map}} or 
{{{}flatMap{}}}. This can work, but many users may already be using local for 
"solution-local" without noticing. Also there are subtle differences among them.

(1) {{map}} only emits one solution per each incoming input, while working in 
solution-local
{noformat}
gremlin> g.V().map(out()).path()
==>[v[1],v[3]]
==>[v[4],v[5]]
==>[v[6],v[3]]{noformat}
(2) {{flatMap }}can stream all solutions and solution-local, but only leaves 
the last element in Path unlike {{local}}
{noformat}
gremlin> g.V().flatMap(out().out()).path()
==>[v[1],v[5]]
==>[v[1],v[3]]{noformat}
This {{flatMap's}} behavior is not documented but there are use-cases that 
users intentionally use flatMap for this feature.

So while in the documentation we recommend use these 2 instead of local, in 
some case it's not easy to migrate. At this point, I think
 * We should clarify in the doc that
 ** What {{barrier() / noOpBarrierStep}} does and how it makes impact on 
{{local()}}
 ** How{{ LazyBarrierStrategy }}is related to {{barrier() / noOpBarrierStep}} 
 ** what is different between {{{}map{}}}, {{flatMap}} and {{{}local{}}}, 
including Path handling

and instead of describing local() as internal use when implementing Strategy, 
we should tell users to use it whenever they understand how it works and what 
they are doing with local().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to