Re: Request/discussion: BufferedReader reading using async API while providing sync API

Pavel Rappo Wed, 26 Oct 2016 03:58:07 -0700

I believe I see where you coming from. Please correct me if I'm wrong. 

Your implementation is based on the premise that a call to 
ReadableByteChannel.read()
_initiates_ the operation and returns immediately. The OS then continues to fill
the buffer while there's a free space in the buffer and the channel hasn't 
encountered EOF.


Is that right?

> On 25 Oct 2016, at 22:16, Brunoais <[email protected]> wrote:
> 
> Thank you for your time. I'll try to explain it. I hope I can clear it up.
> First of it, I made a meaning mistake between asynchronous and non-blocking. 
> This implementation uses a non-blocking algorithm internally while providing 
> a blocking-like algorithm on the surface. It is single-threaded and not 
> multi-threaded where one thread fetches data and blocks waiting and the other 
> accumulates it and provides to whichever wants it.
> 
> Second of it, I had made a mistake of going after BufferedReader instead of 
> going after BufferedInputStream. If you want me to go after BufferedReader 
> it's ok but I only thought that going after BufferedInputStream would be more 
> generically useful than BufferedReaderwhen I started the poc.
> 
> On to my code: 
> Short answers: 
>       • The sleep(int) exists because I don't know how to wait until more 
> data exists in the buffer which is part of read()'s contract.
>       • The ByteBuffer gives a buffer that is filled by the OS (what I 
> believe Channels do) instead of getting data only         by demand (what I 
> believe Streams do).
> Full answers: 
> The blockingFill(boolean) method is a method for a busy wait for a fill which 
> is used exclusively by the read() method. All other methods use the version 
> that does not sleep (fill(boolean)).
> blockingFill(boolean)'s existance like that is only because the read() method 
> must not return unless either:
> 
>       • The stream ended.
>       • The next byte is ready for reading.
> Additionally, statistically, that while loop will rarely evaluate to true as 
> reads are in chunks so readPos will be behind writePos most of the time.
> I have no idea if an interrupt will ever happen, to be honest. The main 
> reasons why I'm using a sleep is because I didn't want a hog onto the CPU in 
> a full thread usage busy wait and because I didn't find any way of doing a 
> thread sleep in order to wake up later when the buffer managed by native code 
> has more data.
> The Non-blocking part is managed by the buffer the OS keeps filling most if 
> not all the time. That buffer is the field
> 
> ByteBuffer readBuffer 
> That's the gaining part against the plain old Buffered classes.
> 
> 
> Did that make sense to you? Feel free to ask anything else you need.
> 
> On 25/10/2016 20:52, Pavel Rappo wrote:
>> I've skimmed through the code and I'm not sure I can see any asynchronicity
>> (you were pointing at the lack of it in BufferedReader).
>> And the mechanics of this is very puzzling to me, to be honest:
>>     void blockingFill(boolean forced) throws IOException {
>>         fill(forced);
>>         while (readPos == writePos) {
>>             try {
>>                 Thread.sleep(100);
>>             } catch (InterruptedException e) {
>>                 // An interrupt may mean more data is available
>>             }
>>             fill(forced);
>>         }
>>     }
>> I thought you were suggesting that we should utilize the tools which OS 
>> provides
>> more efficiently. Instead we have something that looks very similarly to a
>> "busy loop" and... also who and when is supposed to interrupt Thread.sleep()?
>> Sorry, I'm not following. Could you please explain how this is supposed to 
>> work?
>> 
>>> On 24 Oct 2016, at 15:59, Brunoais <[email protected]>
>>>  wrote:
>>> Attached and sending!
>>> On 24/10/2016 13:48, Pavel Rappo wrote:
>>> 
>>>> Could you please send a new email on this list with the source attached as 
>>>> a
>>>> text file?
>>>> 
>>>>> On 23 Oct 2016, at 19:14, Brunoais <[email protected]>
>>>>>  wrote:
>>>>> Here's my poc/prototype:
>>>>> 
>>>>> http://pastebin.com/WRpYWDJF
>>>>> 
>>>>> I've implemented the bare minimum of the class that follows the same 
>>>>> contract of BufferedReader while signaling all issues I think it may have 
>>>>> or has in comments.
>>>>> I also wrote some javadoc to help guiding through the class.
>>>>> I could have used more fields from BufferedReader but the names were so 
>>>>> minimalistic that were confusing me. I intent to change them before 
>>>>> sending this to openJDK.
>>>>> One of the major problems this has is long overflowing. It is major 
>>>>> because it is hidden, it will be extremely rare and it takes a really 
>>>>> long time to reproduce. There are different ways of dealing with it. From 
>>>>> just documenting to actually making code that works with it.
>>>>> I built a simple test code for it to have some ideas about performance 
>>>>> and correctness.
>>>>> 
>>>>> http://pastebin.com/eh6LFgwT
>>>>> 
>>>>> This doesn't do a through test if it is actually working correctly but I 
>>>>> see no reason for it not working correctly after fixing the 2 bugs that 
>>>>> test found.
>>>>> I'll also leave here some conclusions about speed and resource 
>>>>> consumption I found.
>>>>> I made tests with default buffer sizes, 5000B  15_000B and 500_000B. I 
>>>>> noticed that, with my hardware, with the 1 530 000 000B file, I was 
>>>>> getting around:
>>>>> In all buffers and fake work: 10~15s speed improvement ( from 90% HDD 
>>>>> speed to 100% HDD speed)
>>>>> In all buffers and no fake work: 1~2s speed improvement ( from 90% HDD 
>>>>> speed to 100% HDD speed)
>>>>> Changing the buffer size was giving different reading speeds but both 
>>>>> were quite equal in how much they would change when changing the buffer 
>>>>> size.
>>>>> Finally, I could always confirm that I/O was always the slowest thing 
>>>>> while this code was running.
>>>>> For the ones wondering about the file size; it is both to avoid OS cache 
>>>>> and to make the reading at the main use-case these objects are for (large 
>>>>> streams of bytes).
>>>>> @Pavel, are you open for discussion now ;)? Need anything else?
>>>>> On 21/10/2016 19:21, Pavel Rappo wrote:
>>>>> 
>>>>>> Just to append to my previous email. BufferedReader wraps any Reader out 
>>>>>> there.
>>>>>> Not specifically FileReader. While you're talking about the case of 
>>>>>> effective
>>>>>> reading from a file.
>>>>>> I guess there's one existing possibility to provide exactly what you 
>>>>>> need (as I
>>>>>> understand it) under this method:
>>>>>> /**
>>>>>>  * Opens a file for reading, returning a {@code BufferedReader} to read 
>>>>>> text
>>>>>>  * from the file in an efficient manner...
>>>>>>    ...
>>>>>>  */
>>>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>>>> It can return _anything_ as long as it is a BufferedReader. We can do 
>>>>>> it, but it
>>>>>> needs to be investigated not only for your favorite OS but for other 
>>>>>> OSes as
>>>>>> well. Feel free to prototype this and we can discuss it on the list 
>>>>>> later.
>>>>>> Thanks,
>>>>>> -Pavel
>>>>>> 
>>>>>>> On 21 Oct 2016, at 18:56, Brunoais <[email protected]>
>>>>>>>  wrote:
>>>>>>> Pavel is right.
>>>>>>> In reality, I was expecting such BufferedReader to use only a single 
>>>>>>> buffer and have that Buffer being filled asynchronously, not in a 
>>>>>>> different Thread.
>>>>>>> Additionally, I don't have the intention of having a larger buffer than 
>>>>>>> before unless stated through the API (the constructor).
>>>>>>> In my idea, internally, it is supposed to use 
>>>>>>> java.nio.channels.AsynchronousFileChannel or equivalent.
>>>>>>> It does not prevent having two buffers and I do not intent to change 
>>>>>>> BufferedReader itself. I'd do an BufferedAsyncReader of sorts (any name 
>>>>>>> suggestion is welcome as I'm an awful namer).
>>>>>>> On 21/10/2016 18:38, Roger Riggs wrote:
>>>>>>> 
>>>>>>>> Hi Pavel,
>>>>>>>> I think Brunoais asking for a double buffering scheme in which the 
>>>>>>>> implementation of
>>>>>>>> BufferReader fills (a second buffer) in parallel with the application 
>>>>>>>> reading from the 1st buffer
>>>>>>>> and managing the swaps and async reads transparently.
>>>>>>>> It would not change the API but would change the interactions between 
>>>>>>>> the buffered reader
>>>>>>>> and the underlying stream.  It would also increase memory requirements 
>>>>>>>> and processing
>>>>>>>> by introducing or using a separate thread and the necessary 
>>>>>>>> synchronization.
>>>>>>>> Though I think the formal interface semantics could be maintained, I 
>>>>>>>> have doubts
>>>>>>>> about compatibility and its unintended consequences on existing 
>>>>>>>> subclasses,
>>>>>>>> applications and libraries.
>>>>>>>> $.02, Roger
>>>>>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote:
>>>>>>>> 
>>>>>>>>> Off the top of my head, I would say it's not possible to change the 
>>>>>>>>> design of an
>>>>>>>>> _extensible_ type that has been out there for 20 or so years. All 
>>>>>>>>> these I/O
>>>>>>>>> streams from java.io were designed for simple synchronous use case.
>>>>>>>>> It's not that their design is flawed in some way, it's that they 
>>>>>>>>> doesn't seem to
>>>>>>>>> suit your needs. Have you considered using 
>>>>>>>>> java.nio.channels.AsynchronousFileChannel
>>>>>>>>> in your applications?
>>>>>>>>> -Pavel
>>>>>>>>> 
>>>>>>>>>> On 21 Oct 2016, at 17:08, Brunoais <[email protected]>
>>>>>>>>>>  wrote:
>>>>>>>>>> Any feedback on this? I'm really interested in implementing such 
>>>>>>>>>> BufferedReader/BufferedStreamReader to allow speeding up my 
>>>>>>>>>> applications without having to think in an asynchronous way or 
>>>>>>>>>> multi-threading while programming with it.
>>>>>>>>>> That's why I'm asking this here.
>>>>>>>>>> On 13/10/2016 14:45, Brunoais wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> I looked at BufferedReader source code for java 9 long with the 
>>>>>>>>>>> source code of the channels/streams used. I noticed that, like in 
>>>>>>>>>>> java 7, BufferedReader does not use an Async API to load data from 
>>>>>>>>>>> files, instead, the data loading is all done synchronously even 
>>>>>>>>>>> when the OS allows requesting a file to be read and getting a 
>>>>>>>>>>> warning later when the file is effectively read.
>>>>>>>>>>> Why Is BufferedReader not async while providing a sync API?
>>>>>>>>>>> 
>>> <BufferedNonBlockStream.java><Tests.java>
>>> 
>

Re: Request/discussion: BufferedReader reading using async API while providing sync API

Reply via email to