Re: RFR(m) 2: 8072722: add stream support to Scanner

Xueming Shen Thu, 10 Sep 2015 14:08:24 -0700

On 09/10/2015 01:22 PM, Stuart Marks wrote:



On 9/9/15 3:51 PM, Xueming Shen wrote:

One more comment regarding the CME check.

2829                     if (expectedCount != modCount) {
2830                         throw new ConcurrentModificationException();
2831                     }

While it definitely serves the purpose of "fail-fast", but given it's a
ordered/sequential
stream, this condition is always checked in the immediate next round of
tryAdvance().
Just doubt it really benefits anyone. It's true that it almost costs nothing
though.


The stream is initially sequential, but it can be changed to parallel by the 
application. If this occurs, this spliterator would be wrapped by code that 
gathers up elements into batches for parallel processing, making the control 
flow more complicated.

2826                 // assert expectedCount == modCount
2827                 if (findPatternInBuffer(pattern, 0)) { // doesn't 
increment modCount
2828                     cons.accept(matcher.toMatchResult());
2829                     if (expectedCount != modCount) {
2830                         throw new ConcurrentModificationException();
2831                     }

I guess the question is, is it worth having two checks in each call to 
tryAdvance()? The first check has to be there in order to initialize modCount 
the first time (see also below), and since you have modCount, you might as well 
check it if it's nonnegative.

The check after the call to the Consumer might seem unnecessary, but note that there is 
no guarantee that tryAdvance() will be called again if the stream decides to terminate 
early. (Consider short-circuiting behavior of findFirst() and friends.) It's easy to see 
that this "extra" check is a guard against the Consumer modifying the scanner's 
state, which I think is fairly important.

On 09/09/2015 03:35 PM, Xueming Shen wrote:

Hi Stuart,

Can't modCount be increased to negative in extreme case?

2705             if (expectedCount>= 0&&  expectedCount != modCount) {
2706                 throw new ConcurrentModificationException();
2707             }


(This is in TokenSpliterator.)

Yes, as tokens are processed, modCount will be incremented continually and 
overflow will eventually occur, causing modCount to wrap around to negative. 
The expectedCount value is continually reinitialized from modCount, so when 
this happens, expectedCount will go negative as well. This will disable the CME 
check at the top of tryAdvance(). If hasNext() is true, though, expectedCount 
is reinitialized from modCount and it's checked unconditionally after the 
Consumer call, so that CME check won't be affected. Similar things can occur if 
modCount has wrapped around to negative before the stream processing starts.

Bottom line is that when modCount is negative, it will bypass half the CME 
checks in TokenSpliterator. But some checking will still be done.

The situation with FindSpliterator is a bit different. The modCount value isn't 
incremented while advancing the spliterator, but there is still the possibility 
that modCount could be negative by the time stream processing starts. If this 
occurs, each call to tryAdvance() will be treated as the initial case and 
expectedCount will be reinitialized from modCount. This means that, as before, 
some CME checking could be bypassed. The check after the Consumer call will 
still work properly though.

Looks like the apparently redundant CME check after the Consumer call is 
helping out after all. :-)

It would be nice if this were cleaner, but it still meets the criterion of "best 
effort". It doesn't appear to fail spuriously, either. I suppose it would halve the 
number of error cases if expectedCount were compared against its sentinel value of -1 
instead of <= 0, but I haven't thought this through. Is it worth it? As things stand, CME 
checks will be missed only happen after a billion calls on the scanner, and this is only 
significant if the application starts modifying the scanner illegally at that point.


I think it might be a "nice to have" for a "fail-fast" effort after the the 
consumer
consumed/accepted the result (the second check), but isn't it a bug for the 
consumer
to accept any result if there is CME condition occurred already?

It'd be better to initialize expectedCount to modCount in constrocutor?


That's how I had it initially, but at Paul Sandoz' suggestion I delayed the 
initialization to the first call to tryAdvance(). This allows the Scanner's 
state to be modified after stream creation but before stream pipeline 
execution. This is the way that Paul's stream code in Matcher works. I'm not 
sure how important this is. Having Scanner be gratuitously different from 
Matcher seems like it would be irritating though.


I noticed the spec says "Scanning starts upon initiation of the terminal stream 
operation, using the
current state of this scanner..." guess it means the "CME" enforcement starts with 
the "stream
operation" starts (a kinda of later-initialization). But personally feel it may 
create a unnecessary
inconsistent situation, depends on whether or not there is state change between 
the creation of
the Stream object and the starting of the stream operation. But I'm not a 
stream expert :-)

-Sherman

Re: RFR(m) 2: 8072722: add stream support to Scanner

Reply via email to