Hi,
The reason we haven't implemented difference() and intersect() is because of
the way we are thinking about it.
union() is a global traversal as once you clone the traverser across the
n-branches, there is no local logic needed for their merging.
This is not the case with intersect() and difference() as you need bookkeeping
to know which branches each traverser went down and you need to localize the
traversers to a particular machine (in OLAP). This will still be the case with
split()/merge(), but it takes on a different feel (implementation-style) when
you don't think in algebraic terms but more as a particle.
*** Bare with the naming/parameterization of merge() as I couldn't come up with
something pretty off the top of my head. ***
split(out('knows'),in('created')).merge(2)
merge(2) is sorta like dedup(). Where you say, ONLY let the traverser through
if he has come through both (two) branches (where branches are labeled in the
MatchStep fashion based on step-id). So now, the rules of splitting are
differentiated from the rules of merging where "union/difference/intersect"
convoluted the two constructs.
Next, split/merge generalizes intersect() where you can merge(3) and have 4
internal split traversals. Likewise, what is difference() -- in the above, its
merge(1)! That is, only let the traverser through if he has come from one and
only one split.
Now, you can of course call this intersect()/difference(), but I think we can
get more fancy like we did with choose() instead of ifThenElse().
Marko.
http://markorodriguez.com
On Oct 15, 2015, at 8:21 AM, Daniel Kuppitz <[email protected]> wrote:
> I don't like the idea to rename it, given that we have a long-term plan to
> implement (you'll find a ticket if you dig deep enough):
>
> - union() (done)
> - difference() (tbd) and
> - intersect() (tbd)
>
> Cheers,
> Daniel
>
>
> On Thu, Oct 15, 2015 at 3:53 PM, Marko Rodriguez <[email protected]>
> wrote:
>
>> Hello,
>>
>> We use the term "union()" to describe the n-furcation of a traverser
>> across n-(nested)traversals. The union()'ing happens at the end when the
>> n-parallel-streams get merged/unioned back to one. Given that Gremlin is
>> read left-to-right, it feels more natural (for me) to say "split()" as that
>> is what is happening on the left of the union(). Yes, like TinkerPop2.
>>
>>
>> g.V.union(out('knows'), in('created')).name
>>
>> VS.
>>
>> g.V.split(out('knows'), in('created')).name
>>
>>
>> The top you are interpreting, union the results of the two internal
>> traversals. Where the assumption is split at the beginning.
>> The bottom you are interpreting, split the traverser across the two
>> internal traversals. Where the assumption is union at the end.
>>
>> For me the latter gives the more "particle" perspective to Gremlin, while
>> the former gives the more algebraic perspective.
>>
>> Note that this would be a straightforward deprecation as its a rename with
>> no semantic alteration.
>>
>> Thoughts?,
>> Marko.
>>
>> http://markorodriguez.com
>>
>>