Re: RFR(m) 2: 8072722: add stream support to Scanner

Stuart Marks Tue, 08 Sep 2015 17:54:30 -0700


On 9/4/15 1:35 AM, Paul Sandoz wrote:

Hi Stuart,

This is looking very good.

Just some comments on the tricky aspect related to late-binding of the Stream 
to the scanner state:

2652      * <p>This scanner's state should not be modified during execution of 
the returned
2653      * stream's pipeline. Subsequent calls to any methods on this scanner 
other than
2654      * {@link #close} and {@link #ioException} may return undefined 
results or may cause
2655      * undefined effects on the returned stream. The returned stream's 
source
2656      * {@code Spliterator} is <em>fail-fast</em> and will, on a best-effort
2657      * basis, throw a {@link java.util.ConcurrentModificationException} if 
such
2658      * modification is detected.

“Subsequent” is a little vague here, does it mean before, during or after 
stream pipeline execution?

Before:

Most methods on scanner may be called before stream pipeline execution, they 
just adjust the scanner state from which to start from. If close is called it 
should result in an ISE on pipeline execution.

During:

Either a CME on a best-effort basis if scanner state is modified by a 
behavioural parameter, or an ISE if close is called.

After:

The scanner is in an indeterminate state. For further operations it needs to be 
reset, and if the scanner was closed during execution an ISE will result on 
further operations.

We should try and succinctly talk about all three cases in the specification.

Mmm, yes, can't get anything vague past you. :-) "Subsequent" should meananytime after initiation of the pipeline execution. But it would be better to bea bit more explicit. Note that effects go both ways; during pipeline execution,calls to scanner methods might cause the stream to throw CME, and streamoperations might cause scanner methods to return unspecified results.


I think the following covers all of the before, during, and after cases.

<< Scanning starts upon initiation of the terminal stream operation, using thecurrent state of this scanner. Subsequent calls to any methods on this scannerother than {@link #close} and {@link #ioException} may return undefined resultsor may cause undefined effects on the returned stream. The returned stream'ssource {@code Spliterator} is <em>fail-fast</em> and will, on a best-effortbasis, throw a {@link java.util.ConcurrentModificationException} if any suchcalls are detected during pipeline execution. >>

The reset() method will reset the delimiter, locale, and radix, but it can'treset the position in the input, so the scanner effectively cannot be reusedafter any stream operation has been initiated.


I'll add the following:

<< After stream execution completes, this scanner is left in an indeterminatestate and cannot be reused. >>

The closed behavior is separate from CME, so I'll add the following to the textthat covers the closing behavior.

<< IllegalStateException is thrown if the scanner has been closed when thismethod is called, or if this scanner is closed during pipeline execution. >>


All of the above apply to both the tokens() method and the main findAll() 
method.

You might need to double check FindSpliterator for the before and during cases 
as i don’t think findPatternInBuffer checks if the scanner is closed, i think 
it needs an ensureOpen call in tryAdvance.

Good catch! I've added an ensureOpen() call here and I've also add tests tocover this case.


Updated webrev:

  http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/

Updated specdiff:


http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html

s'marks

Paul.

On 4 Sep 2015, at 08:17, Stuart Marks <stuart.ma...@oracle.com> wrote:

Please review this update to the Scanner enhancement I proposed a while back. 
[1]

I've updated based on some discussions with Paul Sandoz. The updates since the 
previous posting are 1) coordination of spec wording from Matcher; 2) addition 
of ConcurrentModificationException; 3) updating tests to use the streams 
testing framework; 4) some javadoc cleanups.

Bug:

        https://bugs.openjdk.java.net/browse/JDK-8072722

Webrev:

        http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/

Specdiff:

        
http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html


For convenience, I've appended below the description from my earlier post. [1]

s'marks

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html


-------


Scanner is essentially a regular expression matcher that matches over arbitrary 
input (e.g., from a file) instead of a fixed string like Matcher. Scanner will 
read and buffer additional input as necessary when looking for matches.

This change proposes to add two streams methods:

1. tokens(), which returns a stream of tokens delimited by the Scanner's 
delimiter. Scanner's default delimiter is whitespace, so the following will 
collect a list of whitespace-separated words from a file:

    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
        List<String> words = sc.tokens().collect(toList());
    }

2. findAll(pattern), which returns a stream of match results generated by searching the 
input for the given pattern (either a Pattern or a String). For example, the following 
will extract from a file all words that are surrounded by "_" characters, such 
as _foo_ :

    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
        return sc.findAll("_([\\w]+)_")
                 .map(mr -> mr.group(1))
                 .collect(toList());
    }

Implementation notes. A Scanner is essentially already an iterator of tokens, so tokens() 
pretty much just wraps "this" into a stream. The findAll() methods are a 
wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of refactoring 
to avoid converting the MatchResult to a String prematurely.

The tests are pretty straightforward, with some additional cleanups, such as 
using try-with-resources.

-------

Re: RFR(m) 2: 8072722: add stream support to Scanner

Reply via email to