Did you read the C code?
Have you got any idea how many functions Windows or Linux (nearly all flavors) have for the read operation towards a file?

I have already done that homework myself. I may not have read JVM's source code but I know well that there's functions on both Windows and Linux that provide such interface I mentioned although they require a slightly different treatment (and different constants).


On 27/10/2016 00:06, Vitaly Davidovich wrote:


On Wednesday, October 26, 2016, Brunoais <brunoa...@gmail.com <mailto:brunoa...@gmail.com>> wrote:

    It is actually based on the premise that:

    1. The first call to ReadableByteChannel.read(ByteBuffer) sets the OS
       buffer size to fill in as the same size as ByteBuffer.

Why do you say that? AFAICT, it issues a read syscall and that will block if the data isn't in page cache.

    2. The consecutive calls to ReadableByteChannel.read(ByteBuffer)
    orders
       the JVM to order the OS to execute memcpy() to copy from its memory
       to the shared memory created at ByteBuffer instantiation (in
    java 8)
       using Unsafe and then for the JVM to update the ByteBuffer fields.

I think subsequent reads just invoke the same read syscall, passing the current file offset maintained by the file channel instance.

    3. The call will not block waiting for I/O and it won't take longer
       than the JNI interface if no new data exists. However, it will
    block
       waiting for the OS to execute memcpy() to the shared memory.

So why do you think it won't block?


    Is my premise wrong?

    If I read correctly, if I don't use a DirectBuffer, there would be
    even another intermediate buffer to copy data to before giving it
    to the "user" which would be useless.

If you use a HeapByteBuffer, then there's an extra copy from the native buffer to the Java buffer.



    On 26/10/2016 11:57, Pavel Rappo wrote:

        I believe I see where you coming from. Please correct me if
        I'm wrong.

        Your implementation is based on the premise that a call to
        ReadableByteChannel.read()
        _initiates_ the operation and returns immediately. The OS then
        continues to fill
        the buffer while there's a free space in the buffer and the
        channel hasn't encountered EOF.

        Is that right?

            On 25 Oct 2016, at 22:16, Brunoais <brunoa...@gmail.com>
            wrote:

            Thank you for your time. I'll try to explain it. I hope I
            can clear it up.
            First of it, I made a meaning mistake between asynchronous
            and non-blocking. This implementation uses a non-blocking
            algorithm internally while providing a blocking-like
            algorithm on the surface. It is single-threaded and not
            multi-threaded where one thread fetches data and blocks
            waiting and the other accumulates it and provides to
            whichever wants it.

            Second of it, I had made a mistake of going after
            BufferedReader instead of going after BufferedInputStream.
            If you want me to go after BufferedReader it's ok but I
            only thought that going after BufferedInputStream would be
            more generically useful than BufferedReaderwhen I started
            the poc.

            On to my code:
            Short answers:
                    • The sleep(int) exists because I don't know how
            to wait until more data exists in the buffer which is part
            of read()'s contract.
                    • The ByteBuffer gives a buffer that is filled by
            the OS (what I believe Channels do) instead of getting
            data only         by demand (what I believe Streams do).
            Full answers:
            The blockingFill(boolean) method is a method for a busy
            wait for a fill which is used exclusively by the read()
            method. All other methods use the version that does not
            sleep (fill(boolean)).
            blockingFill(boolean)'s existance like that is only
            because the read() method must not return unless either:

                    • The stream ended.
                    • The next byte is ready for reading.
            Additionally, statistically, that while loop will rarely
            evaluate to true as reads are in chunks so readPos will be
            behind writePos most of the time.
            I have no idea if an interrupt will ever happen, to be
            honest. The main reasons why I'm using a sleep is because
            I didn't want a hog onto the CPU in a full thread usage
            busy wait and because I didn't find any way of doing a
            thread sleep in order to wake up later when the buffer
            managed by native code has more data.
            The Non-blocking part is managed by the buffer the OS
            keeps filling most if not all the time. That buffer is the
            field

            ByteBuffer readBuffer
            That's the gaining part against the plain old Buffered
            classes.


            Did that make sense to you? Feel free to ask anything else
            you need.

            On 25/10/2016 20:52, Pavel Rappo wrote:

                I've skimmed through the code and I'm not sure I can
                see any asynchronicity
                (you were pointing at the lack of it in BufferedReader).
                And the mechanics of this is very puzzling to me, to
                be honest:
                     void blockingFill(boolean forced) throws
                IOException {
                         fill(forced);
                         while (readPos == writePos) {
                             try {
                                 Thread.sleep(100);
                             } catch (InterruptedException e) {
                                 // An interrupt may mean more data is
                available
                             }
                             fill(forced);
                         }
                     }
                I thought you were suggesting that we should utilize
                the tools which OS provides
                more efficiently. Instead we have something that looks
                very similarly to a
                "busy loop" and... also who and when is supposed to
                interrupt Thread.sleep()?
                Sorry, I'm not following. Could you please explain how
                this is supposed to work?

                    On 24 Oct 2016, at 15:59, Brunoais
                    <brunoa...@gmail.com>
                      wrote:
                    Attached and sending!
                    On 24/10/2016 13:48, Pavel Rappo wrote:

                        Could you please send a new email on this list
                        with the source attached as a
                        text file?

                            On 23 Oct 2016, at 19:14, Brunoais
                            <brunoa...@gmail.com>
                              wrote:
                            Here's my poc/prototype:

                            http://pastebin.com/WRpYWDJF

                            I've implemented the bare minimum of the
                            class that follows the same contract of
                            BufferedReader while signaling all issues
                            I think it may have or has in comments.
                            I also wrote some javadoc to help guiding
                            through the class.
                            I could have used more fields from
                            BufferedReader but the names were so
                            minimalistic that were confusing me. I
                            intent to change them before sending this
                            to openJDK.
                            One of the major problems this has is long
                            overflowing. It is major because it is
                            hidden, it will be extremely rare and it
                            takes a really long time to reproduce.
                            There are different ways of dealing with
                            it. From just documenting to actually
                            making code that works with it.
                            I built a simple test code for it to have
                            some ideas about performance and correctness.

                            http://pastebin.com/eh6LFgwT

                            This doesn't do a through test if it is
                            actually working correctly but I see no
                            reason for it not working correctly after
                            fixing the 2 bugs that test found.
                            I'll also leave here some conclusions
                            about speed and resource consumption I found.
                            I made tests with default buffer sizes,
                            5000B 15_000B and 500_000B. I noticed
                            that, with my hardware, with the 1 530 000
                            000B file, I was getting around:
                            In all buffers and fake work: 10~15s speed
                            improvement ( from 90% HDD speed to 100%
                            HDD speed)
                            In all buffers and no fake work: 1~2s
                            speed improvement ( from 90% HDD speed to
                            100% HDD speed)
                            Changing the buffer size was giving
                            different reading speeds but both were
                            quite equal in how much they would change
                            when changing the buffer size.
                            Finally, I could always confirm that I/O
                            was always the slowest thing while this
                            code was running.
                            For the ones wondering about the file
                            size; it is both to avoid OS cache and to
                            make the reading at the main use-case
                            these objects are for (large streams of
                            bytes).
                            @Pavel, are you open for discussion now
                            ;)? Need anything else?
                            On 21/10/2016 19:21, Pavel Rappo wrote:

                                Just to append to my previous email.
                                BufferedReader wraps any Reader out there.
                                Not specifically FileReader. While
                                you're talking about the case of effective
                                reading from a file.
                                I guess there's one existing
                                possibility to provide exactly what
                                you need (as I
                                understand it) under this method:
                                /**
                                  * Opens a file for reading,
                                returning a {@code BufferedReader} to
                                read text
                                  * from the file in an efficient
                                manner...
                                    ...
                                  */
                                
java.nio.file.Files#newBufferedReader(java.nio.file.Path)
                                It can return _anything_ as long as it
                                is a BufferedReader. We can do it, but it
                                needs to be investigated not only for
                                your favorite OS but for other OSes as
                                well. Feel free to prototype this and
                                we can discuss it on the list later.
                                Thanks,
                                -Pavel

                                    On 21 Oct 2016, at 18:56, Brunoais
                                    <brunoa...@gmail.com>
                                      wrote:
                                    Pavel is right.
                                    In reality, I was expecting such
                                    BufferedReader to use only a
                                    single buffer and have that Buffer
                                    being filled asynchronously, not
                                    in a different Thread.
                                    Additionally, I don't have the
                                    intention of having a larger
                                    buffer than before unless stated
                                    through the API (the constructor).
                                    In my idea, internally, it is
                                    supposed to use
                                    java.nio.channels.AsynchronousFileChannel
                                    or equivalent.
                                    It does not prevent having two
                                    buffers and I do not intent to
                                    change BufferedReader itself. I'd
                                    do an BufferedAsyncReader of sorts
                                    (any name suggestion is welcome as
                                    I'm an awful namer).
                                    On 21/10/2016 18:38, Roger Riggs
                                    wrote:

                                        Hi Pavel,
                                        I think Brunoais asking for a
                                        double buffering scheme in
                                        which the implementation of
                                        BufferReader fills (a second
                                        buffer) in parallel with the
                                        application reading from the
                                        1st buffer
                                        and managing the swaps and
                                        async reads transparently.
                                        It would not change the API
                                        but would change the
                                        interactions between the
                                        buffered reader
                                        and the underlying stream.  It
                                        would also increase memory
                                        requirements and processing
                                        by introducing or using a
                                        separate thread and the
                                        necessary synchronization.
                                        Though I think the formal
                                        interface semantics could be
                                        maintained, I have doubts
                                        about compatibility and its
                                        unintended consequences on
                                        existing subclasses,
                                        applications and libraries.
                                        $.02, Roger
                                        On 10/21/16 1:22 PM, Pavel
                                        Rappo wrote:

                                            Off the top of my head, I
                                            would say it's not
                                            possible to change the
                                            design of an
                                            _extensible_ type that has
                                            been out there for 20 or
                                            so years. All these I/O
                                            streams from java.io
                                            <http://java.io> were
                                            designed for simple
                                            synchronous use case.
                                            It's not that their design
                                            is flawed in some way,
                                            it's that they doesn't seem to
                                            suit your needs. Have you
                                            considered using
                                            
java.nio.channels.AsynchronousFileChannel
                                            in your applications?
                                            -Pavel

                                                On 21 Oct 2016, at
                                                17:08, Brunoais
                                                <brunoa...@gmail.com>
                                                  wrote:
                                                Any feedback on this?
                                                I'm really interested
                                                in implementing such
                                                
BufferedReader/BufferedStreamReader
                                                to allow speeding up
                                                my applications
                                                without having to
                                                think in an
                                                asynchronous way or
                                                multi-threading while
                                                programming with it.
                                                That's why I'm asking
                                                this here.
                                                On 13/10/2016 14:45,
                                                Brunoais wrote:

                                                    Hi,
                                                    I looked at
                                                    BufferedReader
                                                    source code for
                                                    java 9 long with
                                                    the source code of
                                                    the
                                                    channels/streams
                                                    used. I noticed
                                                    that, like in java
                                                    7, BufferedReader
                                                    does not use an
                                                    Async API to load
                                                    data from files,
                                                    instead, the data
                                                    loading is all
                                                    done synchronously
                                                    even when the OS
                                                    allows requesting
                                                    a file to be read
                                                    and getting a
                                                    warning later when
                                                    the file is
                                                    effectively read.
                                                    Why Is
                                                    BufferedReader not
                                                    async while
                                                    providing a sync API?

                    <BufferedNonBlockStream.java><Tests.java>





--
Sent from my phone

Reply via email to