On 09/10/2015 01:22 PM, Stuart Marks wrote:
On 9/9/15 3:51 PM, Xueming Shen wrote:
One more comment regarding the CME check.
2829 if (expectedCount != modCount) {
2830 throw new ConcurrentModificationException();
2831 }
While it definitely serves the purpose of "fail-fast", but given it's a
ordered/sequential
stream, this condition is always checked in the immediate next round of
tryAdvance().
Just doubt it really benefits anyone. It's true that it almost costs nothing
though.
The stream is initially sequential, but it can be changed to parallel by the
application. If this occurs, this spliterator would be wrapped by code that
gathers up elements into batches for parallel processing, making the control
flow more complicated.
2826 // assert expectedCount == modCount
2827 if (findPatternInBuffer(pattern, 0)) { // doesn't
increment modCount
2828 cons.accept(matcher.toMatchResult());
2829 if (expectedCount != modCount) {
2830 throw new ConcurrentModificationException();
2831 }
I guess the question is, is it worth having two checks in each call to
tryAdvance()? The first check has to be there in order to initialize modCount
the first time (see also below), and since you have modCount, you might as well
check it if it's nonnegative.
The check after the call to the Consumer might seem unnecessary, but note that there is
no guarantee that tryAdvance() will be called again if the stream decides to terminate
early. (Consider short-circuiting behavior of findFirst() and friends.) It's easy to see
that this "extra" check is a guard against the Consumer modifying the scanner's
state, which I think is fairly important.
On 09/09/2015 03:35 PM, Xueming Shen wrote:
Hi Stuart,
Can't modCount be increased to negative in extreme case?
2705 if (expectedCount>= 0&& expectedCount != modCount) {
2706 throw new ConcurrentModificationException();
2707 }
(This is in TokenSpliterator.)
Yes, as tokens are processed, modCount will be incremented continually and
overflow will eventually occur, causing modCount to wrap around to negative.
The expectedCount value is continually reinitialized from modCount, so when
this happens, expectedCount will go negative as well. This will disable the CME
check at the top of tryAdvance(). If hasNext() is true, though, expectedCount
is reinitialized from modCount and it's checked unconditionally after the
Consumer call, so that CME check won't be affected. Similar things can occur if
modCount has wrapped around to negative before the stream processing starts.
Bottom line is that when modCount is negative, it will bypass half the CME
checks in TokenSpliterator. But some checking will still be done.
The situation with FindSpliterator is a bit different. The modCount value isn't
incremented while advancing the spliterator, but there is still the possibility
that modCount could be negative by the time stream processing starts. If this
occurs, each call to tryAdvance() will be treated as the initial case and
expectedCount will be reinitialized from modCount. This means that, as before,
some CME checking could be bypassed. The check after the Consumer call will
still work properly though.
Looks like the apparently redundant CME check after the Consumer call is
helping out after all. :-)
It would be nice if this were cleaner, but it still meets the criterion of "best
effort". It doesn't appear to fail spuriously, either. I suppose it would halve the
number of error cases if expectedCount were compared against its sentinel value of -1
instead of <= 0, but I haven't thought this through. Is it worth it? As things stand, CME
checks will be missed only happen after a billion calls on the scanner, and this is only
significant if the application starts modifying the scanner illegally at that point.
I think it might be a "nice to have" for a "fail-fast" effort after the the
consumer
consumed/accepted the result (the second check), but isn't it a bug for the
consumer
to accept any result if there is CME condition occurred already?
It'd be better to initialize expectedCount to modCount in constrocutor?
That's how I had it initially, but at Paul Sandoz' suggestion I delayed the
initialization to the first call to tryAdvance(). This allows the Scanner's
state to be modified after stream creation but before stream pipeline
execution. This is the way that Paul's stream code in Matcher works. I'm not
sure how important this is. Having Scanner be gratuitously different from
Matcher seems like it would be irritating though.
I noticed the spec says "Scanning starts upon initiation of the terminal stream
operation, using the
current state of this scanner..." guess it means the "CME" enforcement starts with
the "stream
operation" starts (a kinda of later-initialization). But personally feel it may
create a unnecessary
inconsistent situation, depends on whether or not there is state change between
the creation of
the Stream object and the starting of the stream operation. But I'm not a
stream expert :-)
-Sherman