Re: RFR 8075307 Pipeline calculating inconsistent flag state for parallel stateful ops

Paul Sandoz Tue, 24 Mar 2015 00:59:08 -0700

Hi Stuart,

Many thanks for looking at this gnarly code.


On Mar 24, 2015, at 2:41 AM, Stuart Marks <stuart.ma...@oracle.com> wrote:

> Hi Paul,
> 
> After reading your notes here, and in the bug reports, and the comments in 
> the code, and banging my head against the code (before and after) for a 
> while, I'm starting to see that this is on the right track. Sorry to hedge a 
> bit but I have to admit that I don't fully understand the code.
> 
> However, I do see the comment you referred to (old version, line 460) and 
> that the new code is essentially a merge of the old parallelPrepare() code 
> into sourceSpliterator(). And I do see the crucial addition of the flag 
> modification based on the spliterator's SIZED characteristic. So that seems 
> right, and if the tests pass, so much the better.
> 
> A few comments for future maintenance/cleanups in this area.
> 

All good points i will add some of my own below.


> * This package seems like a curious mixture of abstraction of bit-twiddling 
> into small methods (e.g., combineOpFlags), and bit-twiddling in open code 
> (evalation of Spliterator.SIZED into thisOpFlags, new version lines 459ff). 
> While not incorrect, this is jarring.
> 

Yeah, in more common cases i optimized but it is probably not necessary.


> * The loop variables over the pipeline stages are hard to follow. The 
> variable 'p' is used in the loop at line 413 a the "current" pipeline stage, 
> whereas 'p' is used for the "next" stage in the loop at 434, and 'u' is the 
> "current" stage. This is confusing, and I don't know what 'u' means.
> 

"p" is consistent between both loops for initialization and the next stage, 
it's the termination that is different (the first loop never needs to process 
the source stage and keep track of the previous stage). There is a reason for 
this due to the optimisation in toArray.

"u" = I forget :-)


> * Also, in the same loop, 'e' is initialized to 'this' and is checked as the 
> loop exit condition (I guess that's what 'e' stands for) but it's not used 
> elsewhere, so it's not clear to me how much value this adds.

e == "end of pipeline".

Additionally we don't need the perform the first loop if there are no terminal 
flags.


> 
> Stepping back from this a bit, this is clearly an area of some complexity, 
> and it might benefit from some additional testing. The streams library is 
> overall well-tested, but mostly from a functional level, i.e. running a bunch 
> of streams in various combinations to ensure the right output is produced. 
> For this case it might be helpful to assemble (but not execute) a bunch of 
> combinations of pipelines and then make sure that each stage ends up with the 
> right flags. (I didn't find such a test when I went looking, but I might have 
> missed it.)
> 

There are some. Follow the trail of StatelessTestOp. What we lack is a 
systematic way of performing combinatorial testing derived from a more formal 
description of the operations and their flag manipulation.

In this case, as per the comment in the old code, there was a deliberate 
limitation so testing in this case would not have helped.


> In any case I think it's reasonble to proceed with this patch as it stands 
> instead of tinkering with it. Some additional cleanups are warranted at some 
> point but we should keep an eye on these for the future.
> 

Thanks,
Paul.

Re: RFR 8075307 Pipeline calculating inconsistent flag state for parallel stateful ops

Reply via email to