Re: Request/discussion: BufferedReader reading using async API while providing sync API

Brunoais Thu, 27 Oct 2016 06:07:55 -0700

Thanks for the heads up.

I'll try that later. These tests are still useful then. Meanwhile, I'llend up also checking how FileChannel queries the OS on windows. I'mgetting 100% HDD reads... Could it be that the OS reads the file aheadon its own?... Anyway, I'll look into it. Thanks for the heads up.



On 27/10/2016 13:53, Vitaly Davidovich wrote:

On Thu, Oct 27, 2016 at 8:34 AM, Brunoais <[email protected]<mailto:[email protected]>> wrote:


    Oh... I see. In that case, it means something is terribly wrong.
    It can be my initial tests, though.

    I'm testing on both linux and windows and I'm getting performance
    gains from using the FileChannel compared to using
    FileInputStream... The tests also make sense based on my
    predictions O_O...

FileInputStream requires copying native buffers holding the read datato the java byte[]. If you're using direct ByteBuffer forFileChannel, that whole memcpy is skipped. Try comparing FileChannelwith HeapByteBuffer instead.



    On 27/10/2016 11:47, Vitaly Davidovich wrote:



    On Thursday, October 27, 2016, Brunoais <[email protected]
    <mailto:[email protected]>> wrote:

        Did you read the C code?

    I looked at the Linux code in the JDK.

        Have you got any idea how many functions Windows or Linux
        (nearly all flavors) have for the read operation towards a file?

    I do.


        I have already done that homework myself. I may not have read
        JVM's source code but I know well that there's functions on
        both Windows and Linux that provide such interface I
        mentioned although they require a slightly different
        treatment (and different constants).

    You should read the JDK (native) source code instead of
    guessing/assuming.  On Linux, it doesn't use aio facilities for
    files.  The kernel io scheduler may issue readahead behind the
    scenes, but there's no nonblocking file io that's at the heart of
    your premise.



        On 27/10/2016 00:06, Vitaly Davidovich wrote:



            On Wednesday, October 26, 2016, Brunoais
            <[email protected] <mailto:[email protected]>> wrote:

                It is actually based on the premise that:

                1. The first call to
            ReadableByteChannel.read(ByteBuffer) sets the OS
                   buffer size to fill in as the same size as ByteBuffer.

            Why do you say that? AFAICT, it issues a read syscall and
            that will block if the data isn't in page cache.

                2. The consecutive calls to
            ReadableByteChannel.read(ByteBuffer)
                orders
                   the JVM to order the OS to execute memcpy() to
            copy from its memory
                   to the shared memory created at ByteBuffer
            instantiation (in
                java 8)
                   using Unsafe and then for the JVM to update the
            ByteBuffer fields.

            I think subsequent reads just invoke the same read
            syscall, passing the current file offset maintained by
            the file channel instance.

                3. The call will not block waiting for I/O and it
            won't take longer
                   than the JNI interface if no new data exists.
            However, it will
                block
                   waiting for the OS to execute memcpy() to the
            shared memory.

            So why do you think it won't block?


                Is my premise wrong?

                If I read correctly, if I don't use a DirectBuffer,
            there would be
                even another intermediate buffer to copy data to
            before giving it
                to the "user" which would be useless.

            If you use a HeapByteBuffer, then there's an extra copy
            from the native buffer to the Java buffer.



                On 26/10/2016 11:57, Pavel Rappo wrote:

                    I believe I see where you coming from. Please
            correct me if
                    I'm wrong.

                    Your implementation is based on the premise that
            a call to
                    ReadableByteChannel.read()
                    _initiates_ the operation and returns
            immediately. The OS then
                    continues to fill
                    the buffer while there's a free space in the
            buffer and the
                    channel hasn't encountered EOF.

                    Is that right?

                        On 25 Oct 2016, at 22:16, Brunoais
            <[email protected]>
                        wrote:

                        Thank you for your time. I'll try to explain
            it. I hope I
                        can clear it up.
                        First of it, I made a meaning mistake between
            asynchronous
                        and non-blocking. This implementation uses a
            non-blocking
                        algorithm internally while providing a
            blocking-like
                        algorithm on the surface. It is
            single-threaded and not
                        multi-threaded where one thread fetches data
            and blocks
                        waiting and the other accumulates it and
            provides to
                        whichever wants it.

                        Second of it, I had made a mistake of going after
                        BufferedReader instead of going after
            BufferedInputStream.
                        If you want me to go after BufferedReader
            it's ok but I
                        only thought that going after
            BufferedInputStream would be
                        more generically useful than
            BufferedReaderwhen I started
                        the poc.

                        On to my code:
                        Short answers:
                                • The sleep(int) exists because I
            don't know how
                        to wait until more data exists in the buffer
            which is part
                        of read()'s contract.
                                • The ByteBuffer gives a buffer that
            is filled by
                        the OS (what I believe Channels do) instead
            of getting
                        data only         by demand (what I believe
            Streams do).
                        Full answers:
                        The blockingFill(boolean) method is a method
            for a busy
                        wait for a fill which is used exclusively by
            the read()
                        method. All other methods use the version
            that does not
                        sleep (fill(boolean)).
                        blockingFill(boolean)'s existance like that
            is only
                        because the read() method must not return
            unless either:

                                • The stream ended.
                                • The next byte is ready for reading.
                        Additionally, statistically, that while loop
            will rarely
                        evaluate to true as reads are in chunks so
            readPos will be
                        behind writePos most of the time.
                        I have no idea if an interrupt will ever
            happen, to be
                        honest. The main reasons why I'm using a
            sleep is because
                        I didn't want a hog onto the CPU in a full
            thread usage
                        busy wait and because I didn't find any way
            of doing a
                        thread sleep in order to wake up later when
            the buffer
                        managed by native code has more data.
                        The Non-blocking part is managed by the
            buffer the OS
                        keeps filling most if not all the time. That
            buffer is the
                        field

                        ByteBuffer readBuffer
                        That's the gaining part against the plain old
            Buffered
                        classes.


                        Did that make sense to you? Feel free to ask
            anything else
                        you need.

                        On 25/10/2016 20:52, Pavel Rappo wrote:

                            I've skimmed through the code and I'm not
            sure I can
                            see any asynchronicity
                            (you were pointing at the lack of it in
            BufferedReader).
                            And the mechanics of this is very
            puzzling to me, to
                            be honest:
                                 void blockingFill(boolean forced) throws
                            IOException {
                                     fill(forced);
                                     while (readPos == writePos) {
                                         try {
             Thread.sleep(100);
                                         } catch
            (InterruptedException e) {
                                             // An interrupt may mean
            more data is
                            available
                                         }
                                         fill(forced);
                                     }
                                 }
                            I thought you were suggesting that we
            should utilize
                            the tools which OS provides
                            more efficiently. Instead we have
            something that looks
                            very similarly to a
                            "busy loop" and... also who and when is
            supposed to
                            interrupt Thread.sleep()?
                            Sorry, I'm not following. Could you
            please explain how
                            this is supposed to work?

                                On 24 Oct 2016, at 15:59, Brunoais
                                <[email protected]>
                                  wrote:
                                Attached and sending!
                                On 24/10/2016 13:48, Pavel Rappo wrote:

                                    Could you please send a new email
            on this list
                                    with the source attached as a
                                    text file?

                                        On 23 Oct 2016, at 19:14,
            Brunoais
                                        <[email protected]>
                                          wrote:
                                        Here's my poc/prototype:

            http://pastebin.com/WRpYWDJF

                                        I've implemented the bare
            minimum of the
                                        class that follows the same
            contract of
                                        BufferedReader while
            signaling all issues
                                        I think it may have or has in
            comments.
                                        I also wrote some javadoc to
            help guiding
                                        through the class.
                                        I could have used more fields
            from
                                        BufferedReader but the names
            were so
                                        minimalistic that were
            confusing me. I
                                        intent to change them before
            sending this
                                        to openJDK.
                                        One of the major problems
            this has is long
                                        overflowing. It is major
            because it is
                                        hidden, it will be extremely
            rare and it
                                        takes a really long time to
            reproduce.
                                        There are different ways of
            dealing with
                                        it. From just documenting to
            actually
                                        making code that works with it.
                                        I built a simple test code
            for it to have
                                        some ideas about performance
            and correctness.

            http://pastebin.com/eh6LFgwT

                                        This doesn't do a through
            test if it is
                                        actually working correctly
            but I see no
                                        reason for it not working
            correctly after
                                        fixing the 2 bugs that test
            found.
                                        I'll also leave here some
            conclusions
                                        about speed and resource
            consumption I found.
                                        I made tests with default
            buffer sizes,
                                        5000B 15_000B and 500_000B. I
            noticed
                                        that, with my hardware, with
            the 1 530 000
                                        000B file, I was getting around:
                                        In all buffers and fake work:
            10~15s speed
                                        improvement ( from 90% HDD
            speed to 100%
                                        HDD speed)
                                        In all buffers and no fake
            work: 1~2s
                                        speed improvement ( from 90%
            HDD speed to
                                        100% HDD speed)
                                        Changing the buffer size was
            giving
                                        different reading speeds but
            both were
                                        quite equal in how much they
            would change
                                        when changing the buffer size.
                                        Finally, I could always
            confirm that I/O
                                        was always the slowest thing
            while this
                                        code was running.
                                        For the ones wondering about
            the file
                                        size; it is both to avoid OS
            cache and to
                                        make the reading at the main
            use-case
                                        these objects are for (large
            streams of
                                        bytes).
                                        @Pavel, are you open for
            discussion now
                                        ;)? Need anything else?
                                        On 21/10/2016 19:21, Pavel
            Rappo wrote:

                                            Just to append to my
            previous email.
                                            BufferedReader wraps any
            Reader out there.
                                            Not specifically
            FileReader. While
                                            you're talking about the
            case of effective
                                            reading from a file.
                                            I guess there's one existing
                                            possibility to provide
            exactly what
                                            you need (as I
                                            understand it) under this
            method:
                                            /**
                                              * Opens a file for reading,
                                            returning a {@code
            BufferedReader} to
                                            read text
                                              * from the file in an
            efficient
                                            manner...
                                                ...
                                              */
            java.nio.file.Files#newBufferedReader(java.nio.file.Path)
                                            It can return _anything_
            as long as it
                                            is a BufferedReader. We
            can do it, but it
                                            needs to be investigated
            not only for
                                            your favorite OS but for
            other OSes as
                                            well. Feel free to
            prototype this and
                                            we can discuss it on the
            list later.
                                            Thanks,
                                            -Pavel

                                                On 21 Oct 2016, at
            18:56, Brunoais
                                                <[email protected]>
                                                  wrote:
                                                Pavel is right.
                                                In reality, I was
            expecting such
            BufferedReader to use only a
                                                single buffer and
            have that Buffer
                                                being filled
            asynchronously, not
                                                in a different Thread.
            Additionally, I don't have the
                                                intention of having a
            larger
                                                buffer than before
            unless stated
                                                through the API (the
            constructor).
                                                In my idea,
            internally, it is
                                                supposed to use
            java.nio.channels.AsynchronousFileChannel
                                                or equivalent.
                                                It does not prevent
            having two
                                                buffers and I do not
            intent to
                                                change BufferedReader
            itself. I'd
                                                do an
            BufferedAsyncReader of sorts
                                                (any name suggestion
            is welcome as
                                                I'm an awful namer).
                                                On 21/10/2016 18:38,
            Roger Riggs
                                                wrote:

                                                    Hi Pavel,
                                                    I think Brunoais
            asking for a
                                                    double buffering
            scheme in
                                                    which the
            implementation of
            BufferReader fills (a second
            buffer) in parallel with the
            application reading from the
                                                    1st buffer
                                                    and managing the
            swaps and
                                                    async reads
            transparently.
                                                    It would not
            change the API
                                                    but would change the
            interactions between the
            buffered reader
                                                    and the
            underlying stream.  It
                                                    would also
            increase memory
            requirements and processing
                                                    by introducing or
            using a
            separate thread and the
            necessary synchronization.
                                                    Though I think
            the formal
            interface semantics could be
            maintained, I have doubts
                                                    about
            compatibility and its
            unintended consequences on
            existing subclasses,
            applications and libraries.
                                                    $.02, Roger
                                                    On 10/21/16 1:22
            PM, Pavel
                                                    Rappo wrote:

            Off the top of my head, I
            would say it's not
            possible to change the
            design of an
            _extensible_ type that has
            been out there for 20 or
                                                        so years. All
            these I/O
            streams from java.io <http://java.io>
            <http://java.io> were
            designed for simple
            synchronous use case.
            It's not that their design
                                                        is flawed in
            some way,
            it's that they doesn't seem to
            suit your needs. Have you
            considered using
            java.nio.channels.AsynchronousFileChannel
                                                        in your
            applications?
            -Pavel

              On 21 Oct 2016, at
              17:08, Brunoais
              <[email protected]>
                wrote:
              Any feedback on this?
              I'm really interested
              in implementing such
              BufferedReader/BufferedStreamReader
              to allow speeding up
              my applications
              without having to
              think in an
              asynchronous way or
              multi-threading while
              programming with it.
              That's why I'm asking
              this here.
              On 13/10/2016 14:45,
              Brunoais wrote:

                  Hi,
                  I looked at
                  BufferedReader
                  source code for
                  java 9 long with
                  the source code of
                  the
                  channels/streams
                  used. I noticed
                  that, like in java
                  7, BufferedReader
                  does not use an
                  Async API to load
                  data from files,
                  instead, the data
                  loading is all
                  done synchronously
                  even when the OS
                  allows requesting
                  a file to be read
                  and getting a
                  warning later when
                  the file is
                  effectively read.
                  Why Is
                  BufferedReader not
                  async while
                  providing a sync API?

            <BufferedNonBlockStream.java><Tests.java>

--Sent from my phone

Re: Request/discussion: BufferedReader reading using async API while providing sync API

Reply via email to