Re: Request/discussion: BufferedReader reading using async API while providing sync API

Peter Levart Wed, 26 Oct 2016 10:25:38 -0700

Hi Brunoais,

I'll try to tell what I know from my JMH practice:


On 10/26/2016 10:30 AM, Brunoais wrote:

Hey guys. Any idea where I can find instructions on how to use JMH to:

1. Clear OS' file reading cache.


You can create a public void method and make it called by JMH before each:
- trial (a set of iterations)
- iteration (a set of test method invocations)
- invocation

...simply by annotating it by @Setup( [ Level.Trial | Level.Iteration |Level.Invocation ] ).


So create a method that spawns a script that clears the cache.

2. Warm up whatever it needs to (maybe reading from a Channel in memory).

JMH already warms-up the code and VM simply be executing "warmup"iterations before starting real measured iterations. You can control thenumber of warm-up iterations and real measured iterations by annotatingeither the class or the method(s) with:


@Warmup(iterations = ...)
@Measurement(iterations = ...)

If you want to warm-up resources by code that is not equal to code intest method(s) then maybe @Setup methods on different levels could beused for that.


3. Create a BufferedInputStream with a FileInputStream inside, with
   configurable buffer sizes.

You can annotate a field of int, long or String type of a classannotated with @State annotation (can be the benchmark class itself)with @Param annotation, enumerating values this field will get beforeexecuting the @Setup(Level.Trial) method(s). So you enumerate the buffersizes in @Param annotation and instantiate the BufferedInputStream usingthe value in @Setup method. Viola.

4. Execute iterations to read the file fully.

Then perhaps you could use only one invocation per iteration andmeasured it using @BenchmarkMode(Mode.SingleShotTime), constructing theloop by yourself.

1. Allow setting the byte[] size.

Use @Parameter on a field to hold the byte[] size and create the byte[]in @Setup method...

2. On each iteration, burn a set number of CPU cycles.


BlackHole.consumeCPU(tokens)

5. Re-execute 1, 3 and 4 but with a BufferedNonBlockStream and a
   FileChannel.

If you wrap them all into a common API (by delegation), you can use@Parameter String implType, with @Setup method to instantiate theappropriate implementation. Then just invoke the common API in the testmethod.


So far I still can't find how to:

1 (clear OS' cache)
3 (the configuration part)
4 (variable number of iterations)
4.1 (the configuration)

Can someone please point me in the right direction?


I can create an example test if you like and you can then extend it...

Regards, Peter

On 26/10/2016 07:57, Brunoais wrote:
Hey Bernd!
I don't know how far back you did such thing but I'm getting positiveresults with my non-JMH tests. I do have to evaluate my resultsagainst logic. After some reads, the OS starts caching the file whichis not what I want. It's easy to know when that happens, though. Thetimes fall from ~30s to ~5s and the HDD keeps near idle reading (justlooking at the LED is enough to understand).
If you don't test synchronous work and you only run the reads, youwill only get marginal results as the OS has no real time to fill thebuffer.My research shows the 2 major kernels (windows' and GNU/Linux) havenon-blocking user-level buffer handling where I give a buffer for theOS to read and it keeps filling it and sending messages/signals as itwrites chunks. Linux has an OS interrupt that only sends the signalafter it is full, though. There's also another version of them wherethey use an internal buffer of same size as the buffer you allocatefor the OS and then internally call memcopy() into your user-levelmemory when asked. Tests on the internet show that memcopy is as fast(for 0-1 elements) or faster than System.arraycopy(). I have no ideaif they are true.
All this was for me to add that, that code is tuned to copy from theread buffer only when it is, at least, at half capacity and theinternal buffer has enough storage space. The process is forced onlyif nothing had been read on the previous fill() call. It is built touse JNI as little as possible while providing the major contractBufferedInputStream has.Finally, I never, ever compact the read buffer. It requires doing amemcopy which is definitely not necessary.
Anyway, those tests about time I made were just to get an order ofmagnitude about speed difference. I intended to do them differentlybut JMH looks good so I'll use JMH to test now.
Short reads only happen when fill(true) is called. That happens fordesperate get of data.
I'll look into the avoiding double reading requests. I do think itwon't bring significant improvements if any at all. It only happenswhen the buffer is nearly empty and any byte of data is welcome "atany cost".Besides, whomever called read at that point would also have had anavailability() of 0 and still called read()/read(byte[]).
On 26/10/2016 06:14, Bernd Eckenfels wrote:
 Hallo Brunoais,
In the past I die some experiments with non-blocking file channelsin the hope to increase throughput in a similiar way then yourbuffered stream. I also used direct allocated buffers. However myresults have not been that encouraging (especially If a upper layerused larger reads). I thought back in the time this was mostly dieto the fact that it NOT wraps to real AsyncFIO on most platforms.But maybe I just measured it wrong, so I will have a closer look onyour impl.
Generally I would recommend to make the Benchmark a bit morereliable with JMH and in order to do this to externalize the directbuffer allocation (as it ist slow if done repeatingly). This alsoallows you to publish some results with varrying workloads (ondifferent machines).
I would also measure the readCount to see if short reads happen.
BTW, I might as well try to only read till the end of the buffer inthe backfilling-wraps-around case and not issue two requests, thatmight remove some additional latency.
Gruss
Bernd
--
http://bernd.eckenfels.net

_____________________________
From: Brunoais <[email protected] <mailto:[email protected]>>
Sent: Montag, Oktober 24, 2016 6:30 PM
Subject: Re: Request/discussion: BufferedReader reading using asyncAPI while providing sync APITo: Pavel Rappo <[email protected]<mailto:[email protected]>>Cc: <[email protected]<mailto:[email protected]>>
Attached and sending!


On 24/10/2016 13:48, Pavel Rappo wrote:
> Could you please send a new email on this list with the sourceattached as a
> text file?
>
>> On 23 Oct 2016, at 19:14, Brunoais <[email protected]<mailto:[email protected]>> wrote:
>>
>> Here's my poc/prototype:
>> http://pastebin.com/WRpYWDJF
>>
>> I've implemented the bare minimum of the class that follows thesame contract of BufferedReader while signaling all issues I thinkit may have or has in comments.
>> I also wrote some javadoc to help guiding through the class.
>>
>> I could have used more fields from BufferedReader but the nameswere so minimalistic that were confusing me. I intent to change thembefore sending this to openJDK.
>>
>> One of the major problems this has is long overflowing. It ismajor because it is hidden, it will be extremely rare and it takes areally long time to reproduce. There are different ways of dealingwith it. From just documenting to actually making code that workswith it.
>>
>> I built a simple test code for it to have some ideas aboutperformance and correctness.
>>
>> http://pastebin.com/eh6LFgwT
>>
>> This doesn't do a through test if it is actually workingcorrectly but I see no reason for it not working correctly afterfixing the 2 bugs that test found.
>>
>> I'll also leave here some conclusions about speed and resourceconsumption I found.
>>
>> I made tests with default buffer sizes, 5000B 15_000B and500_000B. I noticed that, with my hardware, with the 1 530 000 000Bfile, I was getting around:
>>
>> In all buffers and fake work: 10~15s speed improvement ( from 90%HDD speed to 100% HDD speed)>> In all buffers and no fake work: 1~2s speed improvement ( from90% HDD speed to 100% HDD speed)
>>
>> Changing the buffer size was giving different reading speeds butboth were quite equal in how much they would change when changingthe buffer size.>> Finally, I could always confirm that I/O was always the slowestthing while this code was running.
>>
>> For the ones wondering about the file size; it is both to avoidOS cache and to make the reading at the main use-case these objectsare for (large streams of bytes).
>>
>> @Pavel, are you open for discussion now ;)? Need anything else?
>>
>> On 21/10/2016 19:21, Pavel Rappo wrote:
>>> Just to append to my previous email. BufferedReader wraps anyReader out there.>>> Not specifically FileReader. While you're talking about the caseof effective
>>> reading from a file.
>>>
>>> I guess there's one existing possibility to provide exactly whatyou need (as I
>>> understand it) under this method:
>>>
>>> /**
>>> * Opens a file for reading, returning a {@code BufferedReader}to read text
>>> * from the file in an efficient manner...
>>> ...
>>> */
>>> java.nio.file.Files#newBufferedReader(java.nio.file.Path)
>>>
>>> It can return _anything_ as long as it is a BufferedReader. Wecan do it, but it>>> needs to be investigated not only for your favorite OS but forother OSes as>>> well. Feel free to prototype this and we can discuss it on thelist later.
>>>
>>> Thanks,
>>> -Pavel
>>>
>>>> On 21 Oct 2016, at 18:56, Brunoais <[email protected]<mailto:[email protected]>> wrote:
>>>>
>>>> Pavel is right.
>>>>
>>>> In reality, I was expecting such BufferedReader to use only asingle buffer and have that Buffer being filled asynchronously, notin a different Thread.>>>> Additionally, I don't have the intention of having a largerbuffer than before unless stated through the API (the constructor).
>>>>
>>>> In my idea, internally, it is supposed to usejava.nio.channels.AsynchronousFileChannel or equivalent.
>>>>
>>>> It does not prevent having two buffers and I do not intent tochange BufferedReader itself. I'd do an BufferedAsyncReader of sorts(any name suggestion is welcome as I'm an awful namer).
>>>>
>>>>
>>>> On 21/10/2016 18:38, Roger Riggs wrote:
>>>>> Hi Pavel,
>>>>>
>>>>> I think Brunoais asking for a double buffering scheme in whichthe implementation of>>>>> BufferReader fills (a second buffer) in parallel with theapplication reading from the 1st buffer
>>>>> and managing the swaps and async reads transparently.
>>>>> It would not change the API but would change the interactionsbetween the buffered reader>>>>> and the underlying stream. It would also increase memoryrequirements and processing>>>>> by introducing or using a separate thread and the necessarysynchronization.
>>>>>
>>>>> Though I think the formal interface semantics could bemaintained, I have doubts>>>>> about compatibility and its unintended consequences onexisting subclasses,
>>>>> applications and libraries.
>>>>>
>>>>> $.02, Roger
>>>>>
>>>>> On 10/21/16 1:22 PM, Pavel Rappo wrote:
>>>>>> Off the top of my head, I would say it's not possible tochange the design of an>>>>>> _extensible_ type that has been out there for 20 or so years.All these I/O>>>>>> streams from java.io <http://java.io> were designed forsimple synchronous use case.
>>>>>>
>>>>>> It's not that their design is flawed in some way, it's thatthey doesn't seem to>>>>>> suit your needs. Have you considered usingjava.nio.channels.AsynchronousFileChannel
>>>>>> in your applications?
>>>>>>
>>>>>> -Pavel
>>>>>>
>>>>>>> On 21 Oct 2016, at 17:08, Brunoais <[email protected]<mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>> Any feedback on this? I'm really interested in implementingsuch BufferedReader/BufferedStreamReader to allow speeding up myapplications without having to think in an asynchronous way ormulti-threading while programming with it.
>>>>>>>
>>>>>>> That's why I'm asking this here.
>>>>>>>
>>>>>>>
>>>>>>> On 13/10/2016 14:45, Brunoais wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I looked at BufferedReader source code for java 9 long withthe source code of the channels/streams used. I noticed that, likein java 7, BufferedReader does not use an Async API to load datafrom files, instead, the data loading is all done synchronously evenwhen the OS allows requesting a file to be read and getting awarning later when the file is effectively read.
>>>>>>>>
>>>>>>>> Why Is BufferedReader not async while providing a sync API?
>>>>>>>>
>

Re: Request/discussion: BufferedReader reading using async API while providing sync API

Reply via email to