Hello, On Linux you could use sysctl to drop caches but I guess one Option is to have a dataset larger than RAM.http://www.digitalinternals.com/unix/linux-clear-memory-cache/403/ Gruss Bernd -- http://bernd.eckenfels.net
On Wed, Oct 26, 2016 at 10:12 PM +0200, "Brunoais" <brunoa...@gmail.com> wrote: Thank you. Only one thing left. How can I "burn" the OS' file read cache? I only know how to do that by allocating a very large amount of memory based on the information I see in the resource manager (windows) or system monitor (linux) of the cached I/O and run the program. In this case, I have no idea how much memory each one's computer has so I cannot use the same method. How would you do such program excerpt? As for the rest of the pointers: thank you I'll start building the benchmark code based on that information. On 26/10/2016 18:24, Peter Levart wrote: > Hi Brunoais, > > I'll try to tell what I know from my JMH practice: > > On 10/26/2016 10:30 AM, Brunoais wrote: >> Hey guys. Any idea where I can find instructions on how to use JMH to: >> >> 1. Clear OS' file reading cache. > > You can create a public void method and make it called by JMH before each: > - trial (a set of iterations) > - iteration (a set of test method invocations) > - invocation > > ...simply by annotating it by @Setup( [ Level.Trial | Level.Iteration > | Level.Invocation ] ). > > So create a method that spawns a script that clears the cache. > >> 2. Warm up whatever it needs to (maybe reading from a Channel in memory). > > JMH already warms-up the code and VM simply be executing "warmup" > iterations before starting real measured iterations. You can control > the number of warm-up iterations and real measured iterations by > annotating either the class or the method(s) with: > > @Warmup(iterations = ...) > @Measurement(iterations = ...) > > If you want to warm-up resources by code that is not equal to code in > test method(s) then maybe @Setup methods on different levels could be > used for that. > >> >> 3. Create a BufferedInputStream with a FileInputStream inside, with >> configurable buffer sizes. > > You can annotate a field of int, long or String type of a class > annotated with @State annotation (can be the benchmark class itself) > with @Param annotation, enumerating values this field will get before > executing the @Setup(Level.Trial) method(s). So you enumerate the > buffer sizes in @Param annotation and instantiate the > BufferedInputStream using the value in @Setup method. Viola. > >> 4. Execute iterations to read the file fully. > > Then perhaps you could use only one invocation per iteration and > measured it using @BenchmarkMode(Mode.SingleShotTime), constructing > the loop by yourself. > >> 1. Allow setting the byte[] size. > > Use @Parameter on a field to hold the byte[] size and create the > byte[] in @Setup method... > >> 2. On each iteration, burn a set number of CPU cycles. > > BlackHole.consumeCPU(tokens) > >> 5. Re-execute 1, 3 and 4 but with a BufferedNonBlockStream and a >> FileChannel. > > If you wrap them all into a common API (by delegation), you can use > @Parameter String implType, with @Setup method to instantiate the > appropriate implementation. Then just invoke the common API in the > test method. > >> >> So far I still can't find how to: >> >> 1 (clear OS' cache) >> 3 (the configuration part) >> 4 (variable number of iterations) >> 4.1 (the configuration) >> >> Can someone please point me in the right direction? > > I can create an example test if you like and you can then extend it... > > Regards, Peter > >> >> >> On 26/10/2016 07:57, Brunoais wrote: >>> >>> Hey Bernd! >>> >>> I don't know how far back you did such thing but I'm getting >>> positive results with my non-JMH tests. I do have to evaluate my >>> results against logic. After some reads, the OS starts caching the >>> file which is not what I want. It's easy to know when that happens, >>> though. The times fall from ~30s to ~5s and the HDD keeps near idle >>> reading (just looking at the LED is enough to understand). >>> >>> If you don't test synchronous work and you only run the reads, you >>> will only get marginal results as the OS has no real time to fill >>> the buffer. >>> My research shows the 2 major kernels (windows' and GNU/Linux) have >>> non-blocking user-level buffer handling where I give a buffer for >>> the OS to read and it keeps filling it and sending messages/signals >>> as it writes chunks. Linux has an OS interrupt that only sends the >>> signal after it is full, though. There's also another version of >>> them where they use an internal buffer of same size as the buffer >>> you allocate for the OS and then internally call memcopy() into your >>> user-level memory when asked. Tests on the internet show that >>> memcopy is as fast (for 0-1 elements) or faster than >>> System.arraycopy(). I have no idea if they are true. >>> >>> All this was for me to add that, that code is tuned to copy from the >>> read buffer only when it is, at least, at half capacity and the >>> internal buffer has enough storage space. The process is forced only >>> if nothing had been read on the previous fill() call. It is built to >>> use JNI as little as possible while providing the major contract >>> BufferedInputStream has. >>> Finally, I never, ever compact the read buffer. It requires doing a >>> memcopy which is definitely not necessary. >>> >>> Anyway, those tests about time I made were just to get an order of >>> magnitude about speed difference. I intended to do them differently >>> but JMH looks good so I'll use JMH to test now. >>> >>> Short reads only happen when fill(true) is called. That happens for >>> desperate get of data. >>> >>> I'll look into the avoiding double reading requests. I do think it >>> won't bring significant improvements if any at all. It only happens >>> when the buffer is nearly empty and any byte of data is welcome "at >>> any cost". >>> Besides, whomever called read at that point would also have had an >>> availability() of 0 and still called read()/read(byte[]). >>> >>> >>> On 26/10/2016 06:14, Bernd Eckenfels wrote: >>>> Hallo Brunoais, >>>> >>>> In the past I die some experiments with non-blocking file channels >>>> in the hope to increase throughput in a similiar way then your >>>> buffered stream. I also used direct allocated buffers. However my >>>> results have not been that encouraging (especially If a upper layer >>>> used larger reads). I thought back in the time this was mostly die >>>> to the fact that it NOT wraps to real AsyncFIO on most platforms. >>>> But maybe I just measured it wrong, so I will have a closer look on >>>> your impl. >>>> >>>> Generally I would recommend to make the Benchmark a bit more >>>> reliable with JMH and in order to do this to externalize the direct >>>> buffer allocation (as it ist slow if done repeatingly). This also >>>> allows you to publish some results with varrying workloads (on >>>> different machines). >>>> >>>> I would also measure the readCount to see if short reads happen. >>>> >>>> BTW, I might as well try to only read till the end of the buffer >>>> in the backfilling-wraps-around case and not issue two requests, >>>> that might remove some additional latency. >>>> >>>> Gruss >>>> Bernd >>>> -- >>>> http://bernd.eckenfels.net >>>> >>>> _____________________________ >>>> From: Brunoais > >>>> Sent: Montag, Oktober 24, 2016 6:30 PM >>>> Subject: Re: Request/discussion: BufferedReader reading using async >>>> API while providing sync API >>>> To: Pavel Rappo >>> > >>>> Cc: >>> > >>>> >>>> >>>> Attached and sending! >>>> >>>> >>>> On 24/10/2016 13:48, Pavel Rappo wrote: >>>> > Could you please send a new email on this list with the source >>>> attached as a >>>> > text file? >>>> > >>>> >> On 23 Oct 2016, at 19:14, Brunoais >>> > wrote: >>>> >> >>>> >> Here's my poc/prototype: >>>> >> http://pastebin.com/WRpYWDJF >>>> >> >>>> >> I've implemented the bare minimum of the class that follows the >>>> same contract of BufferedReader while signaling all issues I think >>>> it may have or has in comments. >>>> >> I also wrote some javadoc to help guiding through the class. >>>> >> >>>> >> I could have used more fields from BufferedReader but the names >>>> were so minimalistic that were confusing me. I intent to change >>>> them before sending this to openJDK. >>>> >> >>>> >> One of the major problems this has is long overflowing. It is >>>> major because it is hidden, it will be extremely rare and it takes >>>> a really long time to reproduce. There are different ways of >>>> dealing with it. From just documenting to actually making code that >>>> works with it. >>>> >> >>>> >> I built a simple test code for it to have some ideas about >>>> performance and correctness. >>>> >> >>>> >> http://pastebin.com/eh6LFgwT >>>> >> >>>> >> This doesn't do a through test if it is actually working >>>> correctly but I see no reason for it not working correctly after >>>> fixing the 2 bugs that test found. >>>> >> >>>> >> I'll also leave here some conclusions about speed and resource >>>> consumption I found. >>>> >> >>>> >> I made tests with default buffer sizes, 5000B 15_000B and >>>> 500_000B. I noticed that, with my hardware, with the 1 530 000 000B >>>> file, I was getting around: >>>> >> >>>> >> In all buffers and fake work: 10~15s speed improvement ( from >>>> 90% HDD speed to 100% HDD speed) >>>> >> In all buffers and no fake work: 1~2s speed improvement ( from >>>> 90% HDD speed to 100% HDD speed) >>>> >> >>>> >> Changing the buffer size was giving different reading speeds but >>>> both were quite equal in how much they would change when changing >>>> the buffer size. >>>> >> Finally, I could always confirm that I/O was always the slowest >>>> thing while this code was running. >>>> >> >>>> >> For the ones wondering about the file size; it is both to avoid >>>> OS cache and to make the reading at the main use-case these objects >>>> are for (large streams of bytes). >>>> >> >>>> >> @Pavel, are you open for discussion now ;)? Need anything else? >>>> >> >>>> >> On 21/10/2016 19:21, Pavel Rappo wrote: >>>> >>> Just to append to my previous email. BufferedReader wraps any >>>> Reader out there. >>>> >>> Not specifically FileReader. While you're talking about the >>>> case of effective >>>> >>> reading from a file. >>>> >>> >>>> >>> I guess there's one existing possibility to provide exactly >>>> what you need (as I >>>> >>> understand it) under this method: >>>> >>> >>>> >>> /** >>>> >>> * Opens a file for reading, returning a {@code BufferedReader} >>>> to read text >>>> >>> * from the file in an efficient manner... >>>> >>> ... >>>> >>> */ >>>> >>> java.nio.file.Files#newBufferedReader(java.nio.file.Path) >>>> >>> >>>> >>> It can return _anything_ as long as it is a BufferedReader. We >>>> can do it, but it >>>> >>> needs to be investigated not only for your favorite OS but for >>>> other OSes as >>>> >>> well. Feel free to prototype this and we can discuss it on the >>>> list later. >>>> >>> >>>> >>> Thanks, >>>> >>> -Pavel >>>> >>> >>>> >>>> On 21 Oct 2016, at 18:56, Brunoais >>> > wrote: >>>> >>>> >>>> >>>> Pavel is right. >>>> >>>> >>>> >>>> In reality, I was expecting such BufferedReader to use only a >>>> single buffer and have that Buffer being filled asynchronously, not >>>> in a different Thread. >>>> >>>> Additionally, I don't have the intention of having a larger >>>> buffer than before unless stated through the API (the constructor). >>>> >>>> >>>> >>>> In my idea, internally, it is supposed to use >>>> java.nio.channels.AsynchronousFileChannel or equivalent. >>>> >>>> >>>> >>>> It does not prevent having two buffers and I do not intent to >>>> change BufferedReader itself. I'd do an BufferedAsyncReader of >>>> sorts (any name suggestion is welcome as I'm an awful namer). >>>> >>>> >>>> >>>> >>>> >>>> On 21/10/2016 18:38, Roger Riggs wrote: >>>> >>>>> Hi Pavel, >>>> >>>>> >>>> >>>>> I think Brunoais asking for a double buffering scheme in >>>> which the implementation of >>>> >>>>> BufferReader fills (a second buffer) in parallel with the >>>> application reading from the 1st buffer >>>> >>>>> and managing the swaps and async reads transparently. >>>> >>>>> It would not change the API but would change the interactions >>>> between the buffered reader >>>> >>>>> and the underlying stream. It would also increase memory >>>> requirements and processing >>>> >>>>> by introducing or using a separate thread and the necessary >>>> synchronization. >>>> >>>>> >>>> >>>>> Though I think the formal interface semantics could be >>>> maintained, I have doubts >>>> >>>>> about compatibility and its unintended consequences on >>>> existing subclasses, >>>> >>>>> applications and libraries. >>>> >>>>> >>>> >>>>> $.02, Roger >>>> >>>>> >>>> >>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote: >>>> >>>>>> Off the top of my head, I would say it's not possible to >>>> change the design of an >>>> >>>>>> _extensible_ type that has been out there for 20 or so >>>> years. All these I/O >>>> >>>>>> streams from java.io were designed for >>>> simple synchronous use case. >>>> >>>>>> >>>> >>>>>> It's not that their design is flawed in some way, it's that >>>> they doesn't seem to >>>> >>>>>> suit your needs. Have you considered using >>>> java.nio.channels.AsynchronousFileChannel >>>> >>>>>> in your applications? >>>> >>>>>> >>>> >>>>>> -Pavel >>>> >>>>>> >>>> >>>>>>> On 21 Oct 2016, at 17:08, Brunoais >>> > wrote: >>>> >>>>>>> >>>> >>>>>>> Any feedback on this? I'm really interested in implementing >>>> such BufferedReader/BufferedStreamReader to allow speeding up my >>>> applications without having to think in an asynchronous way or >>>> multi-threading while programming with it. >>>> >>>>>>> >>>> >>>>>>> That's why I'm asking this here. >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> On 13/10/2016 14:45, Brunoais wrote: >>>> >>>>>>>> Hi, >>>> >>>>>>>> >>>> >>>>>>>> I looked at BufferedReader source code for java 9 long >>>> with the source code of the channels/streams used. I noticed that, >>>> like in java 7, BufferedReader does not use an Async API to load >>>> data from files, instead, the data loading is all done >>>> synchronously even when the OS allows requesting a file to be read >>>> and getting a warning later when the file is effectively read. >>>> >>>>>>>> >>>> >>>>>>>> Why Is BufferedReader not async while providing a sync API? >>>> >>>>>>>> >>>> > >>>> >>>> >>>> >>> >> >