On Thursday, October 27, 2016, Brunoais <brunoa...@gmail.com> wrote: > You are right. Even in windows it does not set the flags for async reads. > It seems like it is windows itself that does the decision to buffer the > contents based on its own heuristics. > You mean nonblocking, not async, right? Two different things.
> But... Why? Why won't it be? Why is there no API for it? How am I getting > 100% HDD use and faster times when I fake work to delay getting more data > and I only have a fluctuating 60-90% (always going up and down) when I use > an InputStream? > Is it related to how both classes cache and how frequently and how much > each one asks for data? > > I really would prefer not having to read the source code because it takes > a real long time T.T. > > I end up reinstating... And wondering... > > Why doesn't java provide a single-threaded non-block API for file reads > for all OS that support it? I simply cannot find that information no matter > how much I search on google, bing, duck duck go... Can any of you point me > to whomever knows? > https://lwn.net/Articles/612483/ for Linux. Unfortunately, the nonblocking file io story is complicated and messy. > On 27/10/2016 14:11, Vitaly Davidovich wrote: > > I don't know about Windows specifically, but generally file systems across > major OS's will implement readahead in their IO scheduler when they detect > sequential scans. > > On Linux, you can also strace your test to confirm which syscalls are > emitted (you should be seeing plain read()'s there, with FileInputStream > and FileChannel). > > On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoa...@gmail.com > <javascript:_e(%7B%7D,'cvml','brunoa...@gmail.com');>> wrote: > >> Thanks for the heads up. >> >> I'll try that later. These tests are still useful then. Meanwhile, I'll >> end up also checking how FileChannel queries the OS on windows. I'm getting >> 100% HDD reads... Could it be that the OS reads the file ahead on its >> own?... Anyway, I'll look into it. Thanks for the heads up. >> >> On 27/10/2016 13:53, Vitaly Davidovich wrote: >> >> >> >> On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <brunoa...@gmail.com >> <javascript:_e(%7B%7D,'cvml','brunoa...@gmail.com');>> wrote: >> >>> Oh... I see. In that case, it means something is terribly wrong. It can >>> be my initial tests, though. >>> >>> I'm testing on both linux and windows and I'm getting performance gains >>> from using the FileChannel compared to using FileInputStream... The tests >>> also make sense based on my predictions O_O... >>> >> FileInputStream requires copying native buffers holding the read data to >> the java byte[]. If you're using direct ByteBuffer for FileChannel, that >> whole memcpy is skipped. Try comparing FileChannel with HeapByteBuffer >> instead. >> >>> >>> On 27/10/2016 11:47, Vitaly Davidovich wrote: >>> >>> >>> >>> On Thursday, October 27, 2016, Brunoais <brunoa...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','brunoa...@gmail.com');>> wrote: >>> >>>> Did you read the C code? >>> >>> I looked at the Linux code in the JDK. >>> >>>> Have you got any idea how many functions Windows or Linux (nearly all >>>> flavors) have for the read operation towards a file? >>> >>> I do. >>> >>>> >>>> I have already done that homework myself. I may not have read JVM's >>>> source code but I know well that there's functions on both Windows and >>>> Linux that provide such interface I mentioned although they require a >>>> slightly different treatment (and different constants). >>> >>> You should read the JDK (native) source code instead of >>> guessing/assuming. On Linux, it doesn't use aio facilities for files. The >>> kernel io scheduler may issue readahead behind the scenes, but there's no >>> nonblocking file io that's at the heart of your premise. >>> >>>> >>>> >>>> On 27/10/2016 00:06, Vitaly Davidovich wrote: >>>> >>>>> >>>>> >>>>> On Wednesday, October 26, 2016, Brunoais <brunoa...@gmail.com <mailto: >>>>> brunoa...@gmail.com>> wrote: >>>>> >>>>> It is actually based on the premise that: >>>>> >>>>> 1. The first call to ReadableByteChannel.read(ByteBuffer) sets >>>>> the OS >>>>> buffer size to fill in as the same size as ByteBuffer. >>>>> >>>>> Why do you say that? AFAICT, it issues a read syscall and that will >>>>> block if the data isn't in page cache. >>>>> >>>>> 2. The consecutive calls to ReadableByteChannel.read(ByteBuffer) >>>>> orders >>>>> the JVM to order the OS to execute memcpy() to copy from its >>>>> memory >>>>> to the shared memory created at ByteBuffer instantiation (in >>>>> java 8) >>>>> using Unsafe and then for the JVM to update the ByteBuffer >>>>> fields. >>>>> >>>>> I think subsequent reads just invoke the same read syscall, passing >>>>> the current file offset maintained by the file channel instance. >>>>> >>>>> 3. The call will not block waiting for I/O and it won't take longer >>>>> than the JNI interface if no new data exists. However, it will >>>>> block >>>>> waiting for the OS to execute memcpy() to the shared memory. >>>>> >>>>> So why do you think it won't block? >>>>> >>>>> >>>>> Is my premise wrong? >>>>> >>>>> If I read correctly, if I don't use a DirectBuffer, there would be >>>>> even another intermediate buffer to copy data to before giving it >>>>> to the "user" which would be useless. >>>>> >>>>> If you use a HeapByteBuffer, then there's an extra copy from the >>>>> native buffer to the Java buffer. >>>>> >>>>> >>>>> >>>>> On 26/10/2016 11:57, Pavel Rappo wrote: >>>>> >>>>> I believe I see where you coming from. Please correct me if >>>>> I'm wrong. >>>>> >>>>> Your implementation is based on the premise that a call to >>>>> ReadableByteChannel.read() >>>>> _initiates_ the operation and returns immediately. The OS then >>>>> continues to fill >>>>> the buffer while there's a free space in the buffer and the >>>>> channel hasn't encountered EOF. >>>>> >>>>> Is that right? >>>>> >>>>> On 25 Oct 2016, at 22:16, Brunoais <brunoa...@gmail.com> >>>>> wrote: >>>>> >>>>> Thank you for your time. I'll try to explain it. I hope I >>>>> can clear it up. >>>>> First of it, I made a meaning mistake between asynchronous >>>>> and non-blocking. This implementation uses a non-blocking >>>>> algorithm internally while providing a blocking-like >>>>> algorithm on the surface. It is single-threaded and not >>>>> multi-threaded where one thread fetches data and blocks >>>>> waiting and the other accumulates it and provides to >>>>> whichever wants it. >>>>> >>>>> Second of it, I had made a mistake of going after >>>>> BufferedReader instead of going after BufferedInputStream. >>>>> If you want me to go after BufferedReader it's ok but I >>>>> only thought that going after BufferedInputStream would be >>>>> more generically useful than BufferedReaderwhen I started >>>>> the poc. >>>>> >>>>> On to my code: >>>>> Short answers: >>>>> • The sleep(int) exists because I don't know how >>>>> to wait until more data exists in the buffer which is part >>>>> of read()'s contract. >>>>> • The ByteBuffer gives a buffer that is filled by >>>>> the OS (what I believe Channels do) instead of getting >>>>> data only by demand (what I believe Streams do). >>>>> Full answers: >>>>> The blockingFill(boolean) method is a method for a busy >>>>> wait for a fill which is used exclusively by the read() >>>>> method. All other methods use the version that does not >>>>> sleep (fill(boolean)). >>>>> blockingFill(boolean)'s existance like that is only >>>>> because the read() method must not return unless either: >>>>> >>>>> • The stream ended. >>>>> • The next byte is ready for reading. >>>>> Additionally, statistically, that while loop will rarely >>>>> evaluate to true as reads are in chunks so readPos will be >>>>> behind writePos most of the time. >>>>> I have no idea if an interrupt will ever happen, to be >>>>> honest. The main reasons why I'm using a sleep is because >>>>> I didn't want a hog onto the CPU in a full thread usage >>>>> busy wait and because I didn't find any way of doing a >>>>> thread sleep in order to wake up later when the buffer >>>>> managed by native code has more data. >>>>> The Non-blocking part is managed by the buffer the OS >>>>> keeps filling most if not all the time. That buffer is the >>>>> field >>>>> >>>>> ByteBuffer readBuffer >>>>> That's the gaining part against the plain old Buffered >>>>> classes. >>>>> >>>>> >>>>> Did that make sense to you? Feel free to ask anything else >>>>> you need. >>>>> >>>>> On 25/10/2016 20:52, Pavel Rappo wrote: >>>>> >>>>> I've skimmed through the code and I'm not sure I can >>>>> see any asynchronicity >>>>> (you were pointing at the lack of it in >>>>> BufferedReader). >>>>> And the mechanics of this is very puzzling to me, to >>>>> be honest: >>>>> void blockingFill(boolean forced) throws >>>>> IOException { >>>>> fill(forced); >>>>> while (readPos == writePos) { >>>>> try { >>>>> Thread.sleep(100); >>>>> } catch (InterruptedException e) { >>>>> // An interrupt may mean more data is >>>>> available >>>>> } >>>>> fill(forced); >>>>> } >>>>> } >>>>> I thought you were suggesting that we should utilize >>>>> the tools which OS provides >>>>> more efficiently. Instead we have something that looks >>>>> very similarly to a >>>>> "busy loop" and... also who and when is supposed to >>>>> interrupt Thread.sleep()? >>>>> Sorry, I'm not following. Could you please explain how >>>>> this is supposed to work? >>>>> >>>>> On 24 Oct 2016, at 15:59, Brunoais >>>>> <brunoa...@gmail.com> >>>>> wrote: >>>>> Attached and sending! >>>>> On 24/10/2016 13:48, Pavel Rappo wrote: >>>>> >>>>> Could you please send a new email on this list >>>>> with the source attached as a >>>>> text file? >>>>> >>>>> On 23 Oct 2016, at 19:14, Brunoais >>>>> <brunoa...@gmail.com> >>>>> wrote: >>>>> Here's my poc/prototype: >>>>> >>>>> http://pastebin.com/WRpYWDJF >>>>> >>>>> I've implemented the bare minimum of the >>>>> class that follows the same contract of >>>>> BufferedReader while signaling all issues >>>>> I think it may have or has in comments. >>>>> I also wrote some javadoc to help guiding >>>>> through the class. >>>>> I could have used more fields from >>>>> BufferedReader but the names were so >>>>> minimalistic that were confusing me. I >>>>> intent to change them before sending this >>>>> to openJDK. >>>>> One of the major problems this has is long >>>>> overflowing. It is major because it is >>>>> hidden, it will be extremely rare and it >>>>> takes a really long time to reproduce. >>>>> There are different ways of dealing with >>>>> it. From just documenting to actually >>>>> making code that works with it. >>>>> I built a simple test code for it to have >>>>> some ideas about performance and >>>>> correctness. >>>>> >>>>> http://pastebin.com/eh6LFgwT >>>>> >>>>> This doesn't do a through test if it is >>>>> actually working correctly but I see no >>>>> reason for it not working correctly after >>>>> fixing the 2 bugs that test found. >>>>> I'll also leave here some conclusions >>>>> about speed and resource consumption I >>>>> found. >>>>> I made tests with default buffer sizes, >>>>> 5000B 15_000B and 500_000B. I noticed >>>>> that, with my hardware, with the 1 530 000 >>>>> 000B file, I was getting around: >>>>> In all buffers and fake work: 10~15s speed >>>>> improvement ( from 90% HDD speed to 100% >>>>> HDD speed) >>>>> In all buffers and no fake work: 1~2s >>>>> speed improvement ( from 90% HDD speed to >>>>> 100% HDD speed) >>>>> Changing the buffer size was giving >>>>> different reading speeds but both were >>>>> quite equal in how much they would change >>>>> when changing the buffer size. >>>>> Finally, I could always confirm that I/O >>>>> was always the slowest thing while this >>>>> code was running. >>>>> For the ones wondering about the file >>>>> size; it is both to avoid OS cache and to >>>>> make the reading at the main use-case >>>>> these objects are for (large streams of >>>>> bytes). >>>>> @Pavel, are you open for discussion now >>>>> ;)? Need anything else? >>>>> On 21/10/2016 19:21, Pavel Rappo wrote: >>>>> >>>>> Just to append to my previous email. >>>>> BufferedReader wraps any Reader out >>>>> there. >>>>> Not specifically FileReader. While >>>>> you're talking about the case of >>>>> effective >>>>> reading from a file. >>>>> I guess there's one existing >>>>> possibility to provide exactly what >>>>> you need (as I >>>>> understand it) under this method: >>>>> /** >>>>> * Opens a file for reading, >>>>> returning a {@code BufferedReader} to >>>>> read text >>>>> * from the file in an efficient >>>>> manner... >>>>> ... >>>>> */ >>>>> java.nio.file.Files#newBuffere >>>>> dReader(java.nio.file.Path) >>>>> It can return _anything_ as long as it >>>>> is a BufferedReader. We can do it, but >>>>> it >>>>> needs to be investigated not only for >>>>> your favorite OS but for other OSes as >>>>> well. Feel free to prototype this and >>>>> we can discuss it on the list later. >>>>> Thanks, >>>>> -Pavel >>>>> >>>>> On 21 Oct 2016, at 18:56, Brunoais >>>>> <brunoa...@gmail.com> >>>>> wrote: >>>>> Pavel is right. >>>>> In reality, I was expecting such >>>>> BufferedReader to use only a >>>>> single buffer and have that Buffer >>>>> being filled asynchronously, not >>>>> in a different Thread. >>>>> Additionally, I don't have the >>>>> intention of having a larger >>>>> buffer than before unless stated >>>>> through the API (the constructor). >>>>> In my idea, internally, it is >>>>> supposed to use >>>>> java.nio.channels.Asynchronous >>>>> FileChannel >>>>> or equivalent. >>>>> It does not prevent having two >>>>> buffers and I do not intent to >>>>> change BufferedReader itself. I'd >>>>> do an BufferedAsyncReader of sorts >>>>> (any name suggestion is welcome as >>>>> I'm an awful namer). >>>>> On 21/10/2016 18:38, Roger Riggs >>>>> wrote: >>>>> >>>>> Hi Pavel, >>>>> I think Brunoais asking for a >>>>> double buffering scheme in >>>>> which the implementation of >>>>> BufferReader fills (a second >>>>> buffer) in parallel with the >>>>> application reading from the >>>>> 1st buffer >>>>> and managing the swaps and >>>>> async reads transparently. >>>>> It would not change the API >>>>> but would change the >>>>> interactions between the >>>>> buffered reader >>>>> and the underlying stream. It >>>>> would also increase memory >>>>> requirements and processing >>>>> by introducing or using a >>>>> separate thread and the >>>>> necessary synchronization. >>>>> Though I think the formal >>>>> interface semantics could be >>>>> maintained, I have doubts >>>>> about compatibility and its >>>>> unintended consequences on >>>>> existing subclasses, >>>>> applications and libraries. >>>>> $.02, Roger >>>>> On 10/21/16 1:22 PM, Pavel >>>>> Rappo wrote: >>>>> >>>>> Off the top of my head, I >>>>> would say it's not >>>>> possible to change the >>>>> design of an >>>>> _extensible_ type that has >>>>> been out there for 20 or >>>>> so years. All these I/O >>>>> streams from java.io >>>>> <http://java.io> were >>>>> designed for simple >>>>> synchronous use case. >>>>> It's not that their design >>>>> is flawed in some way, >>>>> it's that they doesn't >>>>> seem to >>>>> suit your needs. Have you >>>>> considered using >>>>> >>>>> java.nio.channels.AsynchronousFileChannel >>>>> in your applications? >>>>> -Pavel >>>>> >>>>> On 21 Oct 2016, at >>>>> 17:08, Brunoais >>>>> <brunoa...@gmail.com> >>>>> wrote: >>>>> Any feedback on this? >>>>> I'm really interested >>>>> in implementing such >>>>> >>>>> BufferedReader/BufferedStreamReader >>>>> to allow speeding up >>>>> my applications >>>>> without having to >>>>> think in an >>>>> asynchronous way or >>>>> multi-threading while >>>>> programming with it. >>>>> That's why I'm asking >>>>> this here. >>>>> On 13/10/2016 14:45, >>>>> Brunoais wrote: >>>>> >>>>> Hi, >>>>> I looked at >>>>> BufferedReader >>>>> source code for >>>>> java 9 long with >>>>> the source code of >>>>> the >>>>> channels/streams >>>>> used. I noticed >>>>> that, like in java >>>>> 7, BufferedReader >>>>> does not use an >>>>> Async API to load >>>>> data from files, >>>>> instead, the data >>>>> loading is all >>>>> done synchronously >>>>> even when the OS >>>>> allows requesting >>>>> a file to be read >>>>> and getting a >>>>> warning later when >>>>> the file is >>>>> effectively read. >>>>> Why Is >>>>> BufferedReader not >>>>> async while >>>>> providing a sync >>>>> API? >>>>> >>>>> <BufferedNonBlockStream.java><Tests.java> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from my phone >>>>> >>>> >>>> >>> >>> -- >>> Sent from my phone >>> >>> >>> >> >> > > -- Sent from my phone