RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source

Paul Sandoz Wed, 03 Jun 2015 08:56:55 -0700

Hi,

Please review an optimization for Files.lines for certain charsets:


  http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8072773-File-lines/webrev/

If a charset is say US-ASCII or UTF-8 it is possible to implement an efficient 
splitting Spliterator that scans bytes from a mid-point to search for line feed 
characters.

Splitting uses a mapped byte buffer. Traversal uses FileChannel.reads at an 
offset. In previous incarnations i tried to use mapped byte buffer for both, 
but for some reason the traversal performance was not good (both on Mac and 
x86). In any case i am happy with the current approach as there is minimal 
layering between the FileChannel and BufferedReader leveraged to read the lines.

Sequential performance is similar (same or better) than the current approach. 
Parallel performance is much better than the current approach.

Some advice on two aspects would be most appreciated:

1) Is there an easy way to determine the sub-set of supported charsets that are 
applicable?

2) We should try and explicitly unmap the mapped byte buffer when the stream is 
closed, using some sort of shared secret. How can i do that?

Paul.

RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source

Reply via email to