On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoa...@gmail.com> wrote:
> Oh... I see. In that case, it means something is terribly wrong. It can be > my initial tests, though. > > I'm testing on both linux and windows and I'm getting performance gains > from using the FileChannel compared to using FileInputStream... The tests > also make sense based on my predictions O_O... > FileInputStream requires copying native buffers holding the read data to the java byte[]. If you're using direct ByteBuffer for FileChannel, that whole memcpy is skipped. Try comparing FileChannel with HeapByteBuffer instead. > > On 27/10/2016 11:47, Vitaly Davidovich wrote: > > > > On Thursday, October 27, 2016, Brunoais <brunoa...@gmail.com> wrote: > >> Did you read the C code? > > I looked at the Linux code in the JDK. > >> Have you got any idea how many functions Windows or Linux (nearly all >> flavors) have for the read operation towards a file? > > I do. > >> >> I have already done that homework myself. I may not have read JVM's >> source code but I know well that there's functions on both Windows and >> Linux that provide such interface I mentioned although they require a >> slightly different treatment (and different constants). > > You should read the JDK (native) source code instead of > guessing/assuming. On Linux, it doesn't use aio facilities for files. The > kernel io scheduler may issue readahead behind the scenes, but there's no > nonblocking file io that's at the heart of your premise. > >> >> >> On 27/10/2016 00:06, Vitaly Davidovich wrote: >> >>> >>> >>> On Wednesday, October 26, 2016, Brunoais <brunoa...@gmail.com <mailto: >>> brunoa...@gmail.com>> wrote: >>> >>> It is actually based on the premise that: >>> >>> 1. The first call to ReadableByteChannel.read(ByteBuffer) sets the >>> OS >>> buffer size to fill in as the same size as ByteBuffer. >>> >>> Why do you say that? AFAICT, it issues a read syscall and that will >>> block if the data isn't in page cache. >>> >>> 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer) >>> orders >>> the JVM to order the OS to execute memcpy() to copy from its >>> memory >>> to the shared memory created at ByteBuffer instantiation (in >>> java 8) >>> using Unsafe and then for the JVM to update the ByteBuffer fields. >>> >>> I think subsequent reads just invoke the same read syscall, passing the >>> current file offset maintained by the file channel instance. >>> >>> 3. The call will not block waiting for I/O and it won't take longer >>> than the JNI interface if no new data exists. However, it will >>> block >>> waiting for the OS to execute memcpy() to the shared memory. >>> >>> So why do you think it won't block? >>> >>> >>> Is my premise wrong? >>> >>> If I read correctly, if I don't use a DirectBuffer, there would be >>> even another intermediate buffer to copy data to before giving it >>> to the "user" which would be useless. >>> >>> If you use a HeapByteBuffer, then there's an extra copy from the native >>> buffer to the Java buffer. >>> >>> >>> >>> On 26/10/2016 11:57, Pavel Rappo wrote: >>> >>> I believe I see where you coming from. Please correct me if >>> I'm wrong. >>> >>> Your implementation is based on the premise that a call to >>> ReadableByteChannel.read() >>> _initiates_ the operation and returns immediately. The OS then >>> continues to fill >>> the buffer while there's a free space in the buffer and the >>> channel hasn't encountered EOF. >>> >>> Is that right? >>> >>> On 25 Oct 2016, at 22:16, Brunoais <brunoa...@gmail.com> >>> wrote: >>> >>> Thank you for your time. I'll try to explain it. I hope I >>> can clear it up. >>> First of it, I made a meaning mistake between asynchronous >>> and non-blocking. This implementation uses a non-blocking >>> algorithm internally while providing a blocking-like >>> algorithm on the surface. It is single-threaded and not >>> multi-threaded where one thread fetches data and blocks >>> waiting and the other accumulates it and provides to >>> whichever wants it. >>> >>> Second of it, I had made a mistake of going after >>> BufferedReader instead of going after BufferedInputStream. >>> If you want me to go after BufferedReader it's ok but I >>> only thought that going after BufferedInputStream would be >>> more generically useful than BufferedReaderwhen I started >>> the poc. >>> >>> On to my code: >>> Short answers: >>> • The sleep(int) exists because I don't know how >>> to wait until more data exists in the buffer which is part >>> of read()'s contract. >>> • The ByteBuffer gives a buffer that is filled by >>> the OS (what I believe Channels do) instead of getting >>> data only by demand (what I believe Streams do). >>> Full answers: >>> The blockingFill(boolean) method is a method for a busy >>> wait for a fill which is used exclusively by the read() >>> method. All other methods use the version that does not >>> sleep (fill(boolean)). >>> blockingFill(boolean)'s existance like that is only >>> because the read() method must not return unless either: >>> >>> • The stream ended. >>> • The next byte is ready for reading. >>> Additionally, statistically, that while loop will rarely >>> evaluate to true as reads are in chunks so readPos will be >>> behind writePos most of the time. >>> I have no idea if an interrupt will ever happen, to be >>> honest. The main reasons why I'm using a sleep is because >>> I didn't want a hog onto the CPU in a full thread usage >>> busy wait and because I didn't find any way of doing a >>> thread sleep in order to wake up later when the buffer >>> managed by native code has more data. >>> The Non-blocking part is managed by the buffer the OS >>> keeps filling most if not all the time. That buffer is the >>> field >>> >>> ByteBuffer readBuffer >>> That's the gaining part against the plain old Buffered >>> classes. >>> >>> >>> Did that make sense to you? Feel free to ask anything else >>> you need. >>> >>> On 25/10/2016 20:52, Pavel Rappo wrote: >>> >>> I've skimmed through the code and I'm not sure I can >>> see any asynchronicity >>> (you were pointing at the lack of it in BufferedReader). >>> And the mechanics of this is very puzzling to me, to >>> be honest: >>> void blockingFill(boolean forced) throws >>> IOException { >>> fill(forced); >>> while (readPos == writePos) { >>> try { >>> Thread.sleep(100); >>> } catch (InterruptedException e) { >>> // An interrupt may mean more data is >>> available >>> } >>> fill(forced); >>> } >>> } >>> I thought you were suggesting that we should utilize >>> the tools which OS provides >>> more efficiently. Instead we have something that looks >>> very similarly to a >>> "busy loop" and... also who and when is supposed to >>> interrupt Thread.sleep()? >>> Sorry, I'm not following. Could you please explain how >>> this is supposed to work? >>> >>> On 24 Oct 2016, at 15:59, Brunoais >>> <brunoa...@gmail.com> >>> wrote: >>> Attached and sending! >>> On 24/10/2016 13:48, Pavel Rappo wrote: >>> >>> Could you please send a new email on this list >>> with the source attached as a >>> text file? >>> >>> On 23 Oct 2016, at 19:14, Brunoais >>> <brunoa...@gmail.com> >>> wrote: >>> Here's my poc/prototype: >>> >>> http://pastebin.com/WRpYWDJF >>> >>> I've implemented the bare minimum of the >>> class that follows the same contract of >>> BufferedReader while signaling all issues >>> I think it may have or has in comments. >>> I also wrote some javadoc to help guiding >>> through the class. >>> I could have used more fields from >>> BufferedReader but the names were so >>> minimalistic that were confusing me. I >>> intent to change them before sending this >>> to openJDK. >>> One of the major problems this has is long >>> overflowing. It is major because it is >>> hidden, it will be extremely rare and it >>> takes a really long time to reproduce. >>> There are different ways of dealing with >>> it. From just documenting to actually >>> making code that works with it. >>> I built a simple test code for it to have >>> some ideas about performance and correctness. >>> >>> http://pastebin.com/eh6LFgwT >>> >>> This doesn't do a through test if it is >>> actually working correctly but I see no >>> reason for it not working correctly after >>> fixing the 2 bugs that test found. >>> I'll also leave here some conclusions >>> about speed and resource consumption I found. >>> I made tests with default buffer sizes, >>> 5000B 15_000B and 500_000B. I noticed >>> that, with my hardware, with the 1 530 000 >>> 000B file, I was getting around: >>> In all buffers and fake work: 10~15s speed >>> improvement ( from 90% HDD speed to 100% >>> HDD speed) >>> In all buffers and no fake work: 1~2s >>> speed improvement ( from 90% HDD speed to >>> 100% HDD speed) >>> Changing the buffer size was giving >>> different reading speeds but both were >>> quite equal in how much they would change >>> when changing the buffer size. >>> Finally, I could always confirm that I/O >>> was always the slowest thing while this >>> code was running. >>> For the ones wondering about the file >>> size; it is both to avoid OS cache and to >>> make the reading at the main use-case >>> these objects are for (large streams of >>> bytes). >>> @Pavel, are you open for discussion now >>> ;)? Need anything else? >>> On 21/10/2016 19:21, Pavel Rappo wrote: >>> >>> Just to append to my previous email. >>> BufferedReader wraps any Reader out >>> there. >>> Not specifically FileReader. While >>> you're talking about the case of >>> effective >>> reading from a file. >>> I guess there's one existing >>> possibility to provide exactly what >>> you need (as I >>> understand it) under this method: >>> /** >>> * Opens a file for reading, >>> returning a {@code BufferedReader} to >>> read text >>> * from the file in an efficient >>> manner... >>> ... >>> */ >>> java.nio.file.Files#newBuffere >>> dReader(java.nio.file.Path) >>> It can return _anything_ as long as it >>> is a BufferedReader. We can do it, but it >>> needs to be investigated not only for >>> your favorite OS but for other OSes as >>> well. Feel free to prototype this and >>> we can discuss it on the list later. >>> Thanks, >>> -Pavel >>> >>> On 21 Oct 2016, at 18:56, Brunoais >>> <brunoa...@gmail.com> >>> wrote: >>> Pavel is right. >>> In reality, I was expecting such >>> BufferedReader to use only a >>> single buffer and have that Buffer >>> being filled asynchronously, not >>> in a different Thread. >>> Additionally, I don't have the >>> intention of having a larger >>> buffer than before unless stated >>> through the API (the constructor). >>> In my idea, internally, it is >>> supposed to use >>> java.nio.channels.Asynchronous >>> FileChannel >>> or equivalent. >>> It does not prevent having two >>> buffers and I do not intent to >>> change BufferedReader itself. I'd >>> do an BufferedAsyncReader of sorts >>> (any name suggestion is welcome as >>> I'm an awful namer). >>> On 21/10/2016 18:38, Roger Riggs >>> wrote: >>> >>> Hi Pavel, >>> I think Brunoais asking for a >>> double buffering scheme in >>> which the implementation of >>> BufferReader fills (a second >>> buffer) in parallel with the >>> application reading from the >>> 1st buffer >>> and managing the swaps and >>> async reads transparently. >>> It would not change the API >>> but would change the >>> interactions between the >>> buffered reader >>> and the underlying stream. It >>> would also increase memory >>> requirements and processing >>> by introducing or using a >>> separate thread and the >>> necessary synchronization. >>> Though I think the formal >>> interface semantics could be >>> maintained, I have doubts >>> about compatibility and its >>> unintended consequences on >>> existing subclasses, >>> applications and libraries. >>> $.02, Roger >>> On 10/21/16 1:22 PM, Pavel >>> Rappo wrote: >>> >>> Off the top of my head, I >>> would say it's not >>> possible to change the >>> design of an >>> _extensible_ type that has >>> been out there for 20 or >>> so years. All these I/O >>> streams from java.io >>> <http://java.io> were >>> designed for simple >>> synchronous use case. >>> It's not that their design >>> is flawed in some way, >>> it's that they doesn't seem >>> to >>> suit your needs. Have you >>> considered using >>> >>> java.nio.channels.AsynchronousFileChannel >>> in your applications? >>> -Pavel >>> >>> On 21 Oct 2016, at >>> 17:08, Brunoais >>> <brunoa...@gmail.com> >>> wrote: >>> Any feedback on this? >>> I'm really interested >>> in implementing such >>> >>> BufferedReader/BufferedStreamReader >>> to allow speeding up >>> my applications >>> without having to >>> think in an >>> asynchronous way or >>> multi-threading while >>> programming with it. >>> That's why I'm asking >>> this here. >>> On 13/10/2016 14:45, >>> Brunoais wrote: >>> >>> Hi, >>> I looked at >>> BufferedReader >>> source code for >>> java 9 long with >>> the source code of >>> the >>> channels/streams >>> used. I noticed >>> that, like in java >>> 7, BufferedReader >>> does not use an >>> Async API to load >>> data from files, >>> instead, the data >>> loading is all >>> done synchronously >>> even when the OS >>> allows requesting >>> a file to be read >>> and getting a >>> warning later when >>> the file is >>> effectively read. >>> Why Is >>> BufferedReader not >>> async while >>> providing a sync API? >>> >>> <BufferedNonBlockStream.java><Tests.java> >>> >>> >>> >>> >>> >>> -- >>> Sent from my phone >>> >> >> > > -- > Sent from my phone > > >