On Wednesday, October 26, 2016, Brunoais <brunoa...@gmail.com> wrote:
> It is actually based on the premise that: > > 1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS > buffer size to fill in as the same size as ByteBuffer. Why do you say that? AFAICT, it issues a read syscall and that will block if the data isn't in page cache. > 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer) orders > the JVM to order the OS to execute memcpy() to copy from its memory > to the shared memory created at ByteBuffer instantiation (in java 8) > using Unsafe and then for the JVM to update the ByteBuffer fields. I think subsequent reads just invoke the same read syscall, passing the current file offset maintained by the file channel instance. > 3. The call will not block waiting for I/O and it won't take longer > than the JNI interface if no new data exists. However, it will block > waiting for the OS to execute memcpy() to the shared memory. So why do you think it won't block? > > Is my premise wrong? > > If I read correctly, if I don't use a DirectBuffer, there would be even > another intermediate buffer to copy data to before giving it to the "user" > which would be useless. If you use a HeapByteBuffer, then there's an extra copy from the native buffer to the Java buffer. > > > On 26/10/2016 11:57, Pavel Rappo wrote: > >> I believe I see where you coming from. Please correct me if I'm wrong. >> >> Your implementation is based on the premise that a call to >> ReadableByteChannel.read() >> _initiates_ the operation and returns immediately. The OS then continues >> to fill >> the buffer while there's a free space in the buffer and the channel >> hasn't encountered EOF. >> >> Is that right? >> >> On 25 Oct 2016, at 22:16, Brunoais <brunoa...@gmail.com> wrote: >>> >>> Thank you for your time. I'll try to explain it. I hope I can clear it >>> up. >>> First of it, I made a meaning mistake between asynchronous and >>> non-blocking. This implementation uses a non-blocking algorithm internally >>> while providing a blocking-like algorithm on the surface. It is >>> single-threaded and not multi-threaded where one thread fetches data and >>> blocks waiting and the other accumulates it and provides to whichever wants >>> it. >>> >>> Second of it, I had made a mistake of going after BufferedReader instead >>> of going after BufferedInputStream. If you want me to go after >>> BufferedReader it's ok but I only thought that going after >>> BufferedInputStream would be more generically useful than >>> BufferedReaderwhen I started the poc. >>> >>> On to my code: >>> Short answers: >>> • The sleep(int) exists because I don't know how to wait until >>> more data exists in the buffer which is part of read()'s contract. >>> • The ByteBuffer gives a buffer that is filled by the OS (what I >>> believe Channels do) instead of getting data only by demand (what I >>> believe Streams do). >>> Full answers: >>> The blockingFill(boolean) method is a method for a busy wait for a fill >>> which is used exclusively by the read() method. All other methods use the >>> version that does not sleep (fill(boolean)). >>> blockingFill(boolean)'s existance like that is only because the read() >>> method must not return unless either: >>> >>> • The stream ended. >>> • The next byte is ready for reading. >>> Additionally, statistically, that while loop will rarely evaluate to >>> true as reads are in chunks so readPos will be behind writePos most of the >>> time. >>> I have no idea if an interrupt will ever happen, to be honest. The main >>> reasons why I'm using a sleep is because I didn't want a hog onto the CPU >>> in a full thread usage busy wait and because I didn't find any way of doing >>> a thread sleep in order to wake up later when the buffer managed by native >>> code has more data. >>> The Non-blocking part is managed by the buffer the OS keeps filling most >>> if not all the time. That buffer is the field >>> >>> ByteBuffer readBuffer >>> That's the gaining part against the plain old Buffered classes. >>> >>> >>> Did that make sense to you? Feel free to ask anything else you need. >>> >>> On 25/10/2016 20:52, Pavel Rappo wrote: >>> >>>> I've skimmed through the code and I'm not sure I can see any >>>> asynchronicity >>>> (you were pointing at the lack of it in BufferedReader). >>>> And the mechanics of this is very puzzling to me, to be honest: >>>> void blockingFill(boolean forced) throws IOException { >>>> fill(forced); >>>> while (readPos == writePos) { >>>> try { >>>> Thread.sleep(100); >>>> } catch (InterruptedException e) { >>>> // An interrupt may mean more data is available >>>> } >>>> fill(forced); >>>> } >>>> } >>>> I thought you were suggesting that we should utilize the tools which OS >>>> provides >>>> more efficiently. Instead we have something that looks very similarly >>>> to a >>>> "busy loop" and... also who and when is supposed to interrupt >>>> Thread.sleep()? >>>> Sorry, I'm not following. Could you please explain how this is supposed >>>> to work? >>>> >>>> On 24 Oct 2016, at 15:59, Brunoais <brunoa...@gmail.com> >>>>> wrote: >>>>> Attached and sending! >>>>> On 24/10/2016 13:48, Pavel Rappo wrote: >>>>> >>>>> Could you please send a new email on this list with the source >>>>>> attached as a >>>>>> text file? >>>>>> >>>>>> On 23 Oct 2016, at 19:14, Brunoais <brunoa...@gmail.com> >>>>>>> wrote: >>>>>>> Here's my poc/prototype: >>>>>>> >>>>>>> http://pastebin.com/WRpYWDJF >>>>>>> >>>>>>> I've implemented the bare minimum of the class that follows the same >>>>>>> contract of BufferedReader while signaling all issues I think it may >>>>>>> have >>>>>>> or has in comments. >>>>>>> I also wrote some javadoc to help guiding through the class. >>>>>>> I could have used more fields from BufferedReader but the names were >>>>>>> so minimalistic that were confusing me. I intent to change them before >>>>>>> sending this to openJDK. >>>>>>> One of the major problems this has is long overflowing. It is major >>>>>>> because it is hidden, it will be extremely rare and it takes a really >>>>>>> long >>>>>>> time to reproduce. There are different ways of dealing with it. From >>>>>>> just >>>>>>> documenting to actually making code that works with it. >>>>>>> I built a simple test code for it to have some ideas about >>>>>>> performance and correctness. >>>>>>> >>>>>>> http://pastebin.com/eh6LFgwT >>>>>>> >>>>>>> This doesn't do a through test if it is actually working correctly >>>>>>> but I see no reason for it not working correctly after fixing the 2 bugs >>>>>>> that test found. >>>>>>> I'll also leave here some conclusions about speed and resource >>>>>>> consumption I found. >>>>>>> I made tests with default buffer sizes, 5000B 15_000B and 500_000B. >>>>>>> I noticed that, with my hardware, with the 1 530 000 000B file, I was >>>>>>> getting around: >>>>>>> In all buffers and fake work: 10~15s speed improvement ( from 90% >>>>>>> HDD speed to 100% HDD speed) >>>>>>> In all buffers and no fake work: 1~2s speed improvement ( from 90% >>>>>>> HDD speed to 100% HDD speed) >>>>>>> Changing the buffer size was giving different reading speeds but >>>>>>> both were quite equal in how much they would change when changing the >>>>>>> buffer size. >>>>>>> Finally, I could always confirm that I/O was always the slowest >>>>>>> thing while this code was running. >>>>>>> For the ones wondering about the file size; it is both to avoid OS >>>>>>> cache and to make the reading at the main use-case these objects are for >>>>>>> (large streams of bytes). >>>>>>> @Pavel, are you open for discussion now ;)? Need anything else? >>>>>>> On 21/10/2016 19:21, Pavel Rappo wrote: >>>>>>> >>>>>>> Just to append to my previous email. BufferedReader wraps any Reader >>>>>>>> out there. >>>>>>>> Not specifically FileReader. While you're talking about the case of >>>>>>>> effective >>>>>>>> reading from a file. >>>>>>>> I guess there's one existing possibility to provide exactly what >>>>>>>> you need (as I >>>>>>>> understand it) under this method: >>>>>>>> /** >>>>>>>> * Opens a file for reading, returning a {@code BufferedReader} to >>>>>>>> read text >>>>>>>> * from the file in an efficient manner... >>>>>>>> ... >>>>>>>> */ >>>>>>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path) >>>>>>>> It can return _anything_ as long as it is a BufferedReader. We can >>>>>>>> do it, but it >>>>>>>> needs to be investigated not only for your favorite OS but for >>>>>>>> other OSes as >>>>>>>> well. Feel free to prototype this and we can discuss it on the list >>>>>>>> later. >>>>>>>> Thanks, >>>>>>>> -Pavel >>>>>>>> >>>>>>>> On 21 Oct 2016, at 18:56, Brunoais <brunoa...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> Pavel is right. >>>>>>>>> In reality, I was expecting such BufferedReader to use only a >>>>>>>>> single buffer and have that Buffer being filled asynchronously, not >>>>>>>>> in a >>>>>>>>> different Thread. >>>>>>>>> Additionally, I don't have the intention of having a larger buffer >>>>>>>>> than before unless stated through the API (the constructor). >>>>>>>>> In my idea, internally, it is supposed to use >>>>>>>>> java.nio.channels.AsynchronousFileChannel or equivalent. >>>>>>>>> It does not prevent having two buffers and I do not intent to >>>>>>>>> change BufferedReader itself. I'd do an BufferedAsyncReader of sorts >>>>>>>>> (any >>>>>>>>> name suggestion is welcome as I'm an awful namer). >>>>>>>>> On 21/10/2016 18:38, Roger Riggs wrote: >>>>>>>>> >>>>>>>>> Hi Pavel, >>>>>>>>>> I think Brunoais asking for a double buffering scheme in which >>>>>>>>>> the implementation of >>>>>>>>>> BufferReader fills (a second buffer) in parallel with the >>>>>>>>>> application reading from the 1st buffer >>>>>>>>>> and managing the swaps and async reads transparently. >>>>>>>>>> It would not change the API but would change the interactions >>>>>>>>>> between the buffered reader >>>>>>>>>> and the underlying stream. It would also increase memory >>>>>>>>>> requirements and processing >>>>>>>>>> by introducing or using a separate thread and the necessary >>>>>>>>>> synchronization. >>>>>>>>>> Though I think the formal interface semantics could be >>>>>>>>>> maintained, I have doubts >>>>>>>>>> about compatibility and its unintended consequences on existing >>>>>>>>>> subclasses, >>>>>>>>>> applications and libraries. >>>>>>>>>> $.02, Roger >>>>>>>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote: >>>>>>>>>> >>>>>>>>>> Off the top of my head, I would say it's not possible to change >>>>>>>>>>> the design of an >>>>>>>>>>> _extensible_ type that has been out there for 20 or so years. >>>>>>>>>>> All these I/O >>>>>>>>>>> streams from java.io were designed for simple synchronous use >>>>>>>>>>> case. >>>>>>>>>>> It's not that their design is flawed in some way, it's that they >>>>>>>>>>> doesn't seem to >>>>>>>>>>> suit your needs. Have you considered using >>>>>>>>>>> java.nio.channels.AsynchronousFileChannel >>>>>>>>>>> in your applications? >>>>>>>>>>> -Pavel >>>>>>>>>>> >>>>>>>>>>> On 21 Oct 2016, at 17:08, Brunoais <brunoa...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> Any feedback on this? I'm really interested in implementing >>>>>>>>>>>> such BufferedReader/BufferedStreamReader to allow speeding up >>>>>>>>>>>> my applications without having to think in an asynchronous way or >>>>>>>>>>>> multi-threading while programming with it. >>>>>>>>>>>> That's why I'm asking this here. >>>>>>>>>>>> On 13/10/2016 14:45, Brunoais wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>>> I looked at BufferedReader source code for java 9 long with >>>>>>>>>>>>> the source code of the channels/streams used. I noticed that, >>>>>>>>>>>>> like in java >>>>>>>>>>>>> 7, BufferedReader does not use an Async API to load data from >>>>>>>>>>>>> files, >>>>>>>>>>>>> instead, the data loading is all done synchronously even when the >>>>>>>>>>>>> OS allows >>>>>>>>>>>>> requesting a file to be read and getting a warning later when the >>>>>>>>>>>>> file is >>>>>>>>>>>>> effectively read. >>>>>>>>>>>>> Why Is BufferedReader not async while providing a sync API? >>>>>>>>>>>>> >>>>>>>>>>>>> <BufferedNonBlockStream.java><Tests.java> >>>>> >>>>> >> > -- Sent from my phone