Re: RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source

Alan Bateman Wed, 03 Jun 2015 09:21:18 -0700

On 03/06/2015 16:53, Paul Sandoz wrote:

Hi,


Please review an optimization for Files.lines for certain charsets:

   http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8072773-File-lines/webrev/

If a charset is say US-ASCII or UTF-8 it is possible to implement an efficient 
splitting Spliterator that scans bytes from a mid-point to search for line feed 
characters.

Splitting uses a mapped byte buffer. Traversal uses FileChannel.reads at an 
offset. In previous incarnations i tried to use mapped byte buffer for both, 
but for some reason the traversal performance was not good (both on Mac and 
x86). In any case i am happy with the current approach as there is minimal 
layering between the FileChannel and BufferedReader leveraged to read the lines.

Sequential performance is similar (same or better) than the current approach. 
Parallel performance is much better than the current approach.

Some advice on two aspects would be most appreciated:

1) Is there an easy way to determine the sub-set of supported charsets that are 
applicable?

2) We should try and explicitly unmap the mapped byte buffer when the stream is 
closed, using some sort of shared secret. How can i do that?

As this code path is only for the default provider case then there's agood chance that it will be a FileChannelImpl, in which case you cancall its unmap method (directly or via a shared secret). It is possibleto interpose on the default provider so you can't be guaranteed it is aFileChannelImpl of course.

In passing, you might consider moving ByteBufferLinesSpliterator to itsown source file because Files is getting very big.


-Alan.

Re: RFR 8072773 (fs) Files.lines needs a better splitting implementation for stream source

Reply via email to