Re: Request/discussion: BufferedReader reading using async API while providing sync API

Brunoais Thu, 27 Oct 2016 05:35:01 -0700

Oh... I see. In that case, it means something is terribly wrong. It canbe my initial tests, though.

I'm testing on both linux and windows and I'm getting performance gainsfrom using the FileChannel compared to using FileInputStream... Thetests also make sense based on my predictions O_O...



On 27/10/2016 11:47, Vitaly Davidovich wrote:

On Thursday, October 27, 2016, Brunoais <[email protected]<mailto:[email protected]>> wrote:


    Did you read the C code?

I looked at the Linux code in the JDK.

    Have you got any idea how many functions Windows or Linux (nearly
    all flavors) have for the read operation towards a file?

I do.


    I have already done that homework myself. I may not have read
    JVM's source code but I know well that there's functions on both
    Windows and Linux that provide such interface I mentioned although
    they require a slightly different treatment (and different constants).

You should read the JDK (native) source code instead ofguessing/assuming. On Linux, it doesn't use aio facilities forfiles. The kernel io scheduler may issue readahead behind the scenes,but there's no nonblocking file io that's at the heart of your premise.




    On 27/10/2016 00:06, Vitaly Davidovich wrote:



        On Wednesday, October 26, 2016, Brunoais <[email protected]
        <mailto:[email protected]>> wrote:

            It is actually based on the premise that:

            1. The first call to ReadableByteChannel.read(ByteBuffer)
        sets the OS
               buffer size to fill in as the same size as ByteBuffer.

        Why do you say that? AFAICT, it issues a read syscall and that
        will block if the data isn't in page cache.

            2. The consecutive calls to
        ReadableByteChannel.read(ByteBuffer)
            orders
               the JVM to order the OS to execute memcpy() to copy
        from its memory
               to the shared memory created at ByteBuffer
        instantiation (in
            java 8)
               using Unsafe and then for the JVM to update the
        ByteBuffer fields.

        I think subsequent reads just invoke the same read syscall,
        passing the current file offset maintained by the file channel
        instance.

            3. The call will not block waiting for I/O and it won't
        take longer
               than the JNI interface if no new data exists. However,
        it will
            block
               waiting for the OS to execute memcpy() to the shared
        memory.

        So why do you think it won't block?


            Is my premise wrong?

            If I read correctly, if I don't use a DirectBuffer, there
        would be
            even another intermediate buffer to copy data to before
        giving it
            to the "user" which would be useless.

        If you use a HeapByteBuffer, then there's an extra copy from
        the native buffer to the Java buffer.



            On 26/10/2016 11:57, Pavel Rappo wrote:

                I believe I see where you coming from. Please correct
        me if
                I'm wrong.

                Your implementation is based on the premise that a call to
                ReadableByteChannel.read()
                _initiates_ the operation and returns immediately. The
        OS then
                continues to fill
                the buffer while there's a free space in the buffer
        and the
                channel hasn't encountered EOF.

                Is that right?

                    On 25 Oct 2016, at 22:16, Brunoais
        <[email protected]>
                    wrote:

                    Thank you for your time. I'll try to explain it. I
        hope I
                    can clear it up.
                    First of it, I made a meaning mistake between
        asynchronous
                    and non-blocking. This implementation uses a
        non-blocking
                    algorithm internally while providing a blocking-like
                    algorithm on the surface. It is single-threaded
        and not
                    multi-threaded where one thread fetches data and
        blocks
                    waiting and the other accumulates it and provides to
                    whichever wants it.

                    Second of it, I had made a mistake of going after
                    BufferedReader instead of going after
        BufferedInputStream.
                    If you want me to go after BufferedReader it's ok
        but I
                    only thought that going after BufferedInputStream
        would be
                    more generically useful than BufferedReaderwhen I
        started
                    the poc.

                    On to my code:
                    Short answers:
                            • The sleep(int) exists because I don't
        know how
                    to wait until more data exists in the buffer which
        is part
                    of read()'s contract.
                            • The ByteBuffer gives a buffer that is
        filled by
                    the OS (what I believe Channels do) instead of getting
                    data only         by demand (what I believe
        Streams do).
                    Full answers:
                    The blockingFill(boolean) method is a method for a
        busy
                    wait for a fill which is used exclusively by the
        read()
                    method. All other methods use the version that
        does not
                    sleep (fill(boolean)).
                    blockingFill(boolean)'s existance like that is only
                    because the read() method must not return unless
        either:

                            • The stream ended.
                            • The next byte is ready for reading.
                    Additionally, statistically, that while loop will
        rarely
                    evaluate to true as reads are in chunks so readPos
        will be
                    behind writePos most of the time.
                    I have no idea if an interrupt will ever happen, to be
                    honest. The main reasons why I'm using a sleep is
        because
                    I didn't want a hog onto the CPU in a full thread
        usage
                    busy wait and because I didn't find any way of doing a
                    thread sleep in order to wake up later when the buffer
                    managed by native code has more data.
                    The Non-blocking part is managed by the buffer the OS
                    keeps filling most if not all the time. That
        buffer is the
                    field

                    ByteBuffer readBuffer
                    That's the gaining part against the plain old Buffered
                    classes.


                    Did that make sense to you? Feel free to ask
        anything else
                    you need.

                    On 25/10/2016 20:52, Pavel Rappo wrote:

                        I've skimmed through the code and I'm not sure
        I can
                        see any asynchronicity
                        (you were pointing at the lack of it in
        BufferedReader).
                        And the mechanics of this is very puzzling to
        me, to
                        be honest:
                             void blockingFill(boolean forced) throws
                        IOException {
                                 fill(forced);
                                 while (readPos == writePos) {
                                     try {
                                         Thread.sleep(100);
                                     } catch (InterruptedException e) {
                                         // An interrupt may mean more
        data is
                        available
                                     }
                                     fill(forced);
                                 }
                             }
                        I thought you were suggesting that we should
        utilize
                        the tools which OS provides
                        more efficiently. Instead we have something
        that looks
                        very similarly to a
                        "busy loop" and... also who and when is
        supposed to
                        interrupt Thread.sleep()?
                        Sorry, I'm not following. Could you please
        explain how
                        this is supposed to work?

                            On 24 Oct 2016, at 15:59, Brunoais
                            <[email protected]>
                              wrote:
                            Attached and sending!
                            On 24/10/2016 13:48, Pavel Rappo wrote:

                                Could you please send a new email on
        this list
                                with the source attached as a
                                text file?

                                    On 23 Oct 2016, at 19:14, Brunoais
                                    <[email protected]>
                                      wrote:
                                    Here's my poc/prototype:

        http://pastebin.com/WRpYWDJF

                                    I've implemented the bare minimum
        of the
                                    class that follows the same
        contract of
                                    BufferedReader while signaling all
        issues
                                    I think it may have or has in
        comments.
                                    I also wrote some javadoc to help
        guiding
                                    through the class.
                                    I could have used more fields from
                                    BufferedReader but the names were so
                                    minimalistic that were confusing me. I
                                    intent to change them before
        sending this
                                    to openJDK.
                                    One of the major problems this has
        is long
                                    overflowing. It is major because it is
                                    hidden, it will be extremely rare
        and it
                                    takes a really long time to reproduce.
                                    There are different ways of
        dealing with
                                    it. From just documenting to actually
                                    making code that works with it.
                                    I built a simple test code for it
        to have
                                    some ideas about performance and
        correctness.

        http://pastebin.com/eh6LFgwT

                                    This doesn't do a through test if
        it is
                                    actually working correctly but I
        see no
                                    reason for it not working
        correctly after
                                    fixing the 2 bugs that test found.
                                    I'll also leave here some conclusions
                                    about speed and resource
        consumption I found.
                                    I made tests with default buffer
        sizes,
                                    5000B 15_000B and 500_000B. I noticed
                                    that, with my hardware, with the 1
        530 000
                                    000B file, I was getting around:
                                    In all buffers and fake work:
        10~15s speed
                                    improvement ( from 90% HDD speed
        to 100%
                                    HDD speed)
                                    In all buffers and no fake work: 1~2s
                                    speed improvement ( from 90% HDD
        speed to
                                    100% HDD speed)
                                    Changing the buffer size was giving
                                    different reading speeds but both were
                                    quite equal in how much they would
        change
                                    when changing the buffer size.
                                    Finally, I could always confirm
        that I/O
                                    was always the slowest thing while
        this
                                    code was running.
                                    For the ones wondering about the file
                                    size; it is both to avoid OS cache
        and to
                                    make the reading at the main use-case
                                    these objects are for (large
        streams of
                                    bytes).
                                    @Pavel, are you open for
        discussion now
                                    ;)? Need anything else?
                                    On 21/10/2016 19:21, Pavel Rappo
        wrote:

                                        Just to append to my previous
        email.
                                        BufferedReader wraps any
        Reader out there.
                                        Not specifically FileReader. While
                                        you're talking about the case
        of effective
                                        reading from a file.
                                        I guess there's one existing
                                        possibility to provide exactly
        what
                                        you need (as I
                                        understand it) under this method:
                                        /**
                                          * Opens a file for reading,
                                        returning a {@code
        BufferedReader} to
                                        read text
                                          * from the file in an efficient
                                        manner...
                                            ...
                                          */

java.nio.file.Files#newBufferedReader(java.nio.file.Path)

                                        It can return _anything_ as
        long as it
                                        is a BufferedReader. We can do
        it, but it
                                        needs to be investigated not
        only for
                                        your favorite OS but for other
        OSes as
                                        well. Feel free to prototype
        this and
                                        we can discuss it on the list
        later.
                                        Thanks,
                                        -Pavel

                                            On 21 Oct 2016, at 18:56,
        Brunoais
                                            <[email protected]>
                                              wrote:
                                            Pavel is right.
                                            In reality, I was
        expecting such
                                            BufferedReader to use only a
                                            single buffer and have
        that Buffer
                                            being filled
        asynchronously, not
                                            in a different Thread.
                                            Additionally, I don't have the
                                            intention of having a larger
                                            buffer than before unless
        stated
                                            through the API (the
        constructor).
                                            In my idea, internally, it is
                                            supposed to use
        java.nio.channels.AsynchronousFileChannel
                                            or equivalent.
                                            It does not prevent having two
                                            buffers and I do not intent to
                                            change BufferedReader
        itself. I'd
                                            do an BufferedAsyncReader
        of sorts
                                            (any name suggestion is
        welcome as
                                            I'm an awful namer).
                                            On 21/10/2016 18:38, Roger
        Riggs
                                            wrote:

                                                Hi Pavel,
                                                I think Brunoais
        asking for a
                                                double buffering scheme in
                                                which the
        implementation of
                                                BufferReader fills (a
        second
                                                buffer) in parallel
        with the
                                                application reading
        from the
                                                1st buffer
                                                and managing the swaps and
                                                async reads transparently.
                                                It would not change
        the API
                                                but would change the
                                                interactions between the
                                                buffered reader
                                                and the underlying
        stream.  It
                                                would also increase memory
                                                requirements and
        processing
                                                by introducing or using a
                                                separate thread and the
                                                necessary synchronization.
                                                Though I think the formal
                                                interface semantics
        could be
                                                maintained, I have doubts
                                                about compatibility
        and its
                                                unintended consequences on
                                                existing subclasses,
                                                applications and
        libraries.
                                                $.02, Roger
                                                On 10/21/16 1:22 PM, Pavel
                                                Rappo wrote:

                                                    Off the top of my
        head, I
                                                    would say it's not
                                                    possible to change the
                                                    design of an
                                                    _extensible_ type
        that has
                                                    been out there for
        20 or
                                                    so years. All
        these I/O
                                                    streams from
        java.io <http://java.io>
                                                    <http://java.io> were
                                                    designed for simple
                                                    synchronous use case.
                                                    It's not that
        their design
                                                    is flawed in some way,
                                                    it's that they
        doesn't seem to
                                                    suit your needs.
        Have you
                                                    considered using
        java.nio.channels.AsynchronousFileChannel
                                                    in your applications?
                                                    -Pavel

                                                        On 21 Oct 2016, at
                                                        17:08, Brunoais

<[email protected]>

                                                          wrote:
                                                        Any feedback
        on this?
                                                        I'm really
        interested
                                                        in
        implementing such
        BufferedReader/BufferedStreamReader
                                                        to allow
        speeding up
                                                        my applications
                                                        without having to
                                                        think in an
                                                        asynchronous
        way or
        multi-threading while
                                                        programming
        with it.
                                                        That's why I'm
        asking
                                                        this here.
                                                        On 13/10/2016
        14:45,
                                                        Brunoais wrote:

                                                            Hi,
                                                            I looked at
        BufferedReader
                                                            source
        code for
                                                            java 9
        long with
                                                            the source
        code of
                                                            the
        channels/streams
                                                            used. I
        noticed
                                                            that, like
        in java
                                                            7,
        BufferedReader
                                                            does not
        use an
                                                            Async API
        to load
                                                            data from
        files,
                                                            instead,
        the data
                                                            loading is all
                                                            done
        synchronously
                                                            even when
        the OS
                                                            allows
        requesting
                                                            a file to
        be read
                                                            and getting a
                                                            warning
        later when
                                                            the file is
        effectively read.
                                                            Why Is
        BufferedReader not
                                                            async while
                                                            providing
        a sync API?

                            <BufferedNonBlockStream.java><Tests.java>

--Sent from my phone





--
Sent from my phone

Re: Request/discussion: BufferedReader reading using async API while providing sync API

Reply via email to