Hi All,

I propose that the semantics of the repeat() step should be changed
starting 3.8.x. In particular, the repeat() step uses a hybrid execution
approach that can be unintuitive. Initially, a single traverser is passed
to the repeat traversal, so in its first iteration, there is only a single
starting traverser. But in subsequent iterations there can be multiple
starting traversers, if the first iteration produces multiple traversers.
It may make more sense to allow multiple starting traversers in the first
iteration to make it consistent throughout the traversal. This would make
the entire repeat traversal a "global child" rather than a mix of "global
and local child". The most significant impact would be on barriers, as they
could now pull in everything even in the first iteration.

Current:
gremlin> g.V().repeat(both().order()).times(2).path()
==>[v[1],v[2],v[1]]
==>[v[1],v[3],v[1]]
==>[v[1],v[4],v[1]]
==>[v[1],v[4],v[3]]
==>[v[1],v[3],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[3],v[6]]
==>[v[2],v[1],v[2]]
==>[v[2],v[1],v[3]]
==>[v[2],v[1],v[4]]
==>[v[3],v[4],v[1]]
==>[v[3],v[1],v[2]]
==>[v[3],v[1],v[3]]
==>[v[3],v[4],v[3]]
==>[v[3],v[6],v[3]]
==>[v[3],v[1],v[4]]
==>[v[3],v[4],v[5]]
==>[v[4],v[3],v[1]]
==>[v[4],v[1],v[2]]
==>[v[4],v[1],v[3]]
==>[v[4],v[1],v[4]]
==>[v[4],v[3],v[4]]
==>[v[4],v[5],v[4]]
==>[v[4],v[3],v[6]]
==>[v[5],v[4],v[1]]
==>[v[5],v[4],v[3]]
==>[v[5],v[4],v[5]]
==>[v[6],v[3],v[1]]
==>[v[6],v[3],v[4]]
==>[v[6],v[3],v[6]]

Proposed:
gremlin> g.V().repeat(both().order()).times(2).path()
==>[v[1],v[2],v[1]]
==>[v[1],v[3],v[1]]
==>[v[4],v[3],v[1]]
==>[v[6],v[3],v[1]]
==>[v[1],v[4],v[1]]
==>[v[3],v[4],v[1]]
==>[v[5],v[4],v[1]]
==>[v[2],v[1],v[2]]
==>[v[3],v[1],v[2]]
==>[v[4],v[1],v[2]]
==>[v[2],v[1],v[3]]
==>[v[3],v[1],v[3]]
==>[v[4],v[1],v[3]]
==>[v[1],v[4],v[3]]
==>[v[3],v[4],v[3]]
==>[v[5],v[4],v[3]]
==>[v[3],v[6],v[3]]
==>[v[2],v[1],v[4]]
==>[v[3],v[1],v[4]]
==>[v[4],v[1],v[4]]
==>[v[1],v[3],v[4]]
==>[v[4],v[3],v[4]]
==>[v[6],v[3],v[4]]
==>[v[4],v[5],v[4]]
==>[v[1],v[4],v[5]]
==>[v[3],v[4],v[5]]
==>[v[5],v[4],v[5]]
==>[v[1],v[3],v[6]]
==>[v[4],v[3],v[6]]
==>[v[6],v[3],v[6]]

For comparison, here's what the unrolled query does:
gremlin> g.V().both().order().both().order().path()
==>[v[1],v[2],v[1]]
==>[v[1],v[3],v[1]]
==>[v[4],v[3],v[1]]
==>[v[6],v[3],v[1]]
==>[v[1],v[4],v[1]]
==>[v[3],v[4],v[1]]
==>[v[5],v[4],v[1]]
==>[v[2],v[1],v[2]]
==>[v[3],v[1],v[2]]
==>[v[4],v[1],v[2]]
==>[v[2],v[1],v[3]]
==>[v[3],v[1],v[3]]
==>[v[4],v[1],v[3]]
==>[v[1],v[4],v[3]]
==>[v[3],v[4],v[3]]
==>[v[5],v[4],v[3]]
==>[v[3],v[6],v[3]]
==>[v[2],v[1],v[4]]
==>[v[3],v[1],v[4]]
==>[v[4],v[1],v[4]]
==>[v[1],v[3],v[4]]
==>[v[4],v[3],v[4]]
==>[v[6],v[3],v[4]]
==>[v[4],v[5],v[4]]
==>[v[1],v[4],v[5]]
==>[v[3],v[4],v[5]]
==>[v[5],v[4],v[5]]
==>[v[1],v[3],v[6]]
==>[v[4],v[3],v[6]]
==>[v[6],v[3],v[6]]

Notice how currently, the elements are ordered based on the starting
traverser while in the proposed solution the traversers are globally
ordered per iteration and so the final iteration is sorted. Also, the
unrolled query is generally what I expect the result to be and it matches
the proposed semantics.

Any questions or comments?

Thanks,
Ken

Reply via email to