[ 
https://issues.apache.org/jira/browse/TINKERPOP3-863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941741#comment-14941741
 ] 

Marko A. Rodriguez commented on TINKERPOP3-863:
-----------------------------------------------

That is a good idea in terms of semantically sound, but a bad idea in terms of 
efficiency. If you do some massive OLAP job and you have been able to bulk 
1001030304930493 traversers into 1 traverser with a sack of 1.345. The last 
thing you want to do is then split them apart into individual traversers with 
sack 1001030304930493 / 1.345. What you want is to get 1 traverser emitted with 
sack 1.345.

Perhaps we have something like:

{code}
g.withSack(1.0,sum).withBulk(1,one).V().out().sack()
{code}

Thus, {{withBulk(initialValue,splitOp,mergeOp)}} can be tailored like sack, 
where if, like {{withSack}}, if you don't provide a split operator it assumes 
direct memory reference (i.e. 99.999% of uses cases). Or we can say "bulks are 
always a direct copy. its only merges where you can say that you  merge to 1 or 
sum." Then, if its that contrainted why not just --- 
{{g.withSack(1.0,sum).withBulk(false)}}.

Of course that is super gross API-wise, but then we build this ticket out to 
make it manageable: TINKERPOP3-862

Thoughts?


> [Proposal] Turn off bulking -- or is there something more general? (hope not).
> ------------------------------------------------------------------------------
>
>                 Key: TINKERPOP3-863
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-863
>             Project: TinkerPop 3
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.1.0-incubating
>            Reporter: Marko A. Rodriguez
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.1.0-incubating
>
>
> I have a general question -- sometimes you want bulking and sometimes you 
> don't. Why would you no want bulking? Well, lets say you have sack being 1.0 
> and you want to represent energy diffusion and thus, if a traverser splits 
> and goes to two adjacent neighbors, then each sack will be 0.5. Now, lets say 
> those two traverser merge on the next step (a diamond shaped graph), the 
> merged traverser's sack is 1.0 (excellent!). However, its bulk is 2. 
> Dah............. Then the total energy in the graph is 2.0.
> Should we simply have "bulk" and "no bulk" or do we come up with a "bulk 
> merge" model where users can ONLY add bulks (current default and the only 
> method), multiple bulks, min/max bulks, etc. etc…………………….. Scared that the 
> generalization might be an overkill.
> The difference is:
> {code}
> g.withBulk(false)….. // binary -- don't use bulking.
> g.withBulk(true)... // default behavior that is currently just sum the bulks 
> together.
> // or do we go with
> g.withBulk(mult)….. // when two traversers merge, multiply their bulks.. why 
> would you do that, I have no idea, but its general.
> g.withBulk(one) … // would be like binary=false .. always merge to 1 and 
> thus, one BinaryOpeartor(x,y) -> 1
> {code}
> Is this generalization of the bulk merge operator useful? Or do we say -- if 
> you want to do complex functions on "energy" (bulk), you do it via 
> sack........................



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to