On 27/10/2016 22:45, Vitaly Davidovich wrote:


On Thursday, October 27, 2016, Brunoais <brunoa...@gmail.com <mailto:brunoa...@gmail.com>> wrote:

    You are right. Even in windows it does not set the flags for async
    reads. It seems like it is windows itself that does the decision
    to buffer the contents based on its own heuristics.

You mean nonblocking, not async, right? Two different things.
Ups. Mistyped. On windows docs they seem to call it async...

    But... Why? Why won't it be? Why is there no API for it? How am I
    getting 100% HDD use and faster times when I fake work to delay
    getting more data and I only have a fluctuating 60-90% (always
    going up and down) when I use an InputStream?
    Is it related to how both classes cache and how frequently and how
    much each one asks for data?

    I really would prefer not having to read the source code because
    it takes a real long time T.T.

    I end up reinstating... And wondering...

    Why doesn't java provide a single-threaded non-block API for file
    reads for all OS that support it? I simply cannot find that
    information no matter how much I search on google, bing, duck duck
    go... Can any of you point me to whomever knows?

https://lwn.net/Articles/612483/ for Linux. Unfortunately, the nonblocking file io story is complicated and messy.
In Windows manual and Linux manual, they call asynchonous I/O for what is non-blocking synchonous I/O for the program that runs on the OS.
http://man7.org/linux/man-pages/man3/aio_read.3.html
http://man7.org/linux/man-pages/man7/aio.7.html
http://man7.org/linux/man-pages/man7/sigevent.7.html

This does not block, the OS writes directly to the user buffer, does not run on a different user thread and uses either signals or a function pointer as a callback when the operation is completed. Reading the manual, it seems it can even be the own thread. If it is with signals, I do know it is completely non-blocking and single-threaded (from the "user" thread's perspective). I'd like to see this in java...
I guess I only have the NIO2 for that, then with AsynchronousFileChannel.

    On 27/10/2016 14:11, Vitaly Davidovich wrote:
    I don't know about Windows specifically, but generally file
    systems across major OS's will implement readahead in their IO
    scheduler when they detect sequential scans.

    On Linux, you can also strace your test to confirm which syscalls
    are emitted (you should be seeing plain read()'s there, with
    FileInputStream and FileChannel).

    On Thu, Oct 27, 2016 at 9:06 AM, Brunoais <brunoa...@gmail.com
    <javascript:_e(%7B%7D,'cvml','brunoa...@gmail.com');>> wrote:

        Thanks for the heads up.

        I'll try that later. These tests are still useful then.
        Meanwhile, I'll end up also checking how FileChannel queries
        the OS on windows. I'm getting 100% HDD reads... Could it be
        that the OS reads the file ahead on its own?... Anyway, I'll
        look into it. Thanks for the heads up.


        On 27/10/2016 13:53, Vitaly Davidovich wrote:


        On Thu, Oct 27, 2016 at 8:34 AM, Brunoais
        <brunoa...@gmail.com
        <javascript:_e(%7B%7D,'cvml','brunoa...@gmail.com');>> wrote:

            Oh... I see. In that case, it means something is
            terribly wrong. It can be my initial tests, though.

            I'm testing on both linux and windows and I'm getting
            performance gains from using the FileChannel compared to
            using FileInputStream... The tests also make sense based
            on my predictions O_O...

        FileInputStream requires copying native buffers holding the
        read data to the java byte[].  If you're using direct
ByteBuffer for FileChannel, that whole memcpy is skipped. Try comparing FileChannel with HeapByteBuffer instead.


            On 27/10/2016 11:47, Vitaly Davidovich wrote:


            On Thursday, October 27, 2016, Brunoais
            <brunoa...@gmail.com
            <javascript:_e(%7B%7D,'cvml','brunoa...@gmail.com');>>
            wrote:

                Did you read the C code?

            I looked at the Linux code in the JDK.

                Have you got any idea how many functions Windows or
                Linux (nearly all flavors) have for the read
                operation towards a file?

            I do.


                I have already done that homework myself. I may not
                have read JVM's source code but I know well that
                there's functions on both Windows and Linux that
                provide such interface I mentioned although they
                require a slightly different treatment (and
                different constants).

            You should read the JDK (native) source code instead of
            guessing/assuming.  On Linux, it doesn't use aio
            facilities for files.  The kernel io scheduler may
            issue readahead behind the scenes, but there's no
            nonblocking file io that's at the heart of your premise.



                On 27/10/2016 00:06, Vitaly Davidovich wrote:



                    On Wednesday, October 26, 2016, Brunoais
                    <brunoa...@gmail.com
                    <mailto:brunoa...@gmail.com>> wrote:

                        It is actually based on the premise that:

                        1. The first call to
                    ReadableByteChannel.read(ByteBuffer) sets the OS
                           buffer size to fill in as the same size
                    as ByteBuffer.

                    Why do you say that? AFAICT, it issues a read
                    syscall and that will block if the data isn't
                    in page cache.

                        2. The consecutive calls to
                    ReadableByteChannel.read(ByteBuffer)
                        orders
                           the JVM to order the OS to execute
                    memcpy() to copy from its memory
                           to the shared memory created at
                    ByteBuffer instantiation (in
                        java 8)
                           using Unsafe and then for the JVM to
                    update the ByteBuffer fields.

                    I think subsequent reads just invoke the same
                    read syscall, passing the current file offset
                    maintained by the file channel instance.

                        3. The call will not block waiting for I/O
                    and it won't take longer
                           than the JNI interface if no new data
                    exists. However, it will
                        block
                           waiting for the OS to execute memcpy()
                    to the shared memory.

                    So why do you think it won't block?


                        Is my premise wrong?

                        If I read correctly, if I don't use a
                    DirectBuffer, there would be
                        even another intermediate buffer to copy
                    data to before giving it
                        to the "user" which would be useless.

                    If you use a HeapByteBuffer, then there's an
                    extra copy from the native buffer to the Java
                    buffer.



                        On 26/10/2016 11:57, Pavel Rappo wrote:

                            I believe I see where you coming from.
                    Please correct me if
                            I'm wrong.

                            Your implementation is based on the
                    premise that a call to
                    ReadableByteChannel.read()
                            _initiates_ the operation and returns
                    immediately. The OS then
                            continues to fill
                            the buffer while there's a free space
                    in the buffer and the
                            channel hasn't encountered EOF.

                            Is that right?

                                On 25 Oct 2016, at 22:16, Brunoais
                    <brunoa...@gmail.com>
                                wrote:

                                Thank you for your time. I'll try
                    to explain it. I hope I
                                can clear it up.
                                First of it, I made a meaning
                    mistake between asynchronous
                                and non-blocking. This
                    implementation uses a non-blocking
                                algorithm internally while
                    providing a blocking-like
                                algorithm on the surface. It is
                    single-threaded and not
                                multi-threaded where one thread
                    fetches data and blocks
                                waiting and the other accumulates
                    it and provides to
                                whichever wants it.

                                Second of it, I had made a mistake
                    of going after
                                BufferedReader instead of going
                    after BufferedInputStream.
                                If you want me to go after
                    BufferedReader it's ok but I
                                only thought that going after
                    BufferedInputStream would be
                                more generically useful than
                    BufferedReaderwhen I started
                                the poc.

                                On to my code:
                                Short answers:
                                        • The sleep(int) exists
                    because I don't know how
                                to wait until more data exists in
                    the buffer which is part
                                of read()'s contract.
                                        • The ByteBuffer gives a
                    buffer that is filled by
                                the OS (what I believe Channels do)
                    instead of getting
                                data only    by demand (what I
                    believe Streams do).
                                Full answers:
                                The blockingFill(boolean) method is
                    a method for a busy
                                wait for a fill which is used
                    exclusively by the read()
                                method. All other methods use the
                    version that does not
                                sleep (fill(boolean)).
                    blockingFill(boolean)'s existance like that is only
                                because the read() method must not
                    return unless either:

                                        • The stream ended.
                                        • The next byte is ready
                    for reading.
                                Additionally, statistically, that
                    while loop will rarely
                                evaluate to true as reads are in
                    chunks so readPos will be
                                behind writePos most of the time.
                                I have no idea if an interrupt will
                    ever happen, to be
                                honest. The main reasons why I'm
                    using a sleep is because
                                I didn't want a hog onto the CPU in
                    a full thread usage
                                busy wait and because I didn't find
                    any way of doing a
                                thread sleep in order to wake up
                    later when the buffer
                                managed by native code has more data.
                                The Non-blocking part is managed by
                    the buffer the OS
                                keeps filling most if not all the
                    time. That buffer is the
                                field

                                ByteBuffer readBuffer
                                That's the gaining part against the
                    plain old Buffered
                                classes.


                                Did that make sense to you? Feel
                    free to ask anything else
                                you need.

                                On 25/10/2016 20:52, Pavel Rappo wrote:

                                    I've skimmed through the code
                    and I'm not sure I can
                                    see any asynchronicity
                                    (you were pointing at the lack
                    of it in BufferedReader).
                                    And the mechanics of this is
                    very puzzling to me, to
                                    be honest:
                                         void blockingFill(boolean
                    forced) throws
                    IOException {
                     fill(forced);
                     while (readPos == writePos) {
                       try {
                           Thread.sleep(100);
                       } catch (InterruptedException e) {
                           // An interrupt may mean more data is
                                    available
                       }
                       fill(forced);
                                             }
                                         }
                                    I thought you were suggesting
                    that we should utilize
                                    the tools which OS provides
                                    more efficiently. Instead we
                    have something that looks
                                    very similarly to a
                                    "busy loop" and... also who and
                    when is supposed to
                                    interrupt Thread.sleep()?
                                    Sorry, I'm not following. Could
                    you please explain how
                                    this is supposed to work?

                                        On 24 Oct 2016, at 15:59,
                    Brunoais
                                        <brunoa...@gmail.com>
                    wrote:
                    Attached and sending!
                                        On 24/10/2016 13:48, Pavel
                    Rappo wrote:

                    Could you please send a new email on this list
                    with the source attached as a
                    text file?

                      On 23 Oct 2016, at 19:14, Brunoais
                      <brunoa...@gmail.com>
                        wrote:
                      Here's my poc/prototype:

                    http://pastebin.com/WRpYWDJF

                      I've implemented the bare minimum of the
                      class that follows the same contract of
                      BufferedReader while signaling all issues
                      I think it may have or has in comments.
                      I also wrote some javadoc to help guiding
                      through the class.
                      I could have used more fields from
                      BufferedReader but the names were so
                      minimalistic that were confusing me. I
                      intent to change them before sending this
                      to openJDK.
                      One of the major problems this has is long
                      overflowing. It is major because it is
                      hidden, it will be extremely rare and it
                      takes a really long time to reproduce.
                      There are different ways of dealing with
                      it. From just documenting to actually
                      making code that works with it.
                      I built a simple test code for it to have
                      some ideas about performance and correctness.

                    http://pastebin.com/eh6LFgwT

                      This doesn't do a through test if it is
                      actually working correctly but I see no
                      reason for it not working correctly after
                      fixing the 2 bugs that test found.
                      I'll also leave here some conclusions
                      about speed and resource consumption I found.
                      I made tests with default buffer sizes,
                      5000B 15_000B and 500_000B. I noticed
                      that, with my hardware, with the 1 530 000
                      000B file, I was getting around:
                      In all buffers and fake work: 10~15s speed
                      improvement ( from 90% HDD speed to 100%
                      HDD speed)
                      In all buffers and no fake work: 1~2s
                      speed improvement ( from 90% HDD speed to
                      100% HDD speed)
                      Changing the buffer size was giving
                      different reading speeds but both were
                      quite equal in how much they would change
                      when changing the buffer size.
                      Finally, I could always confirm that I/O
                      was always the slowest thing while this
                      code was running.
                      For the ones wondering about the file
                      size; it is both to avoid OS cache and to
                      make the reading at the main use-case
                      these objects are for (large streams of
                      bytes).
                      @Pavel, are you open for discussion now
                      ;)? Need anything else?
                      On 21/10/2016 19:21, Pavel Rappo wrote:

                          Just to append to my previous email.
                          BufferedReader wraps any Reader out there.
                          Not specifically FileReader. While
                          you're talking about the case of effective
                          reading from a file.
                          I guess there's one existing
                          possibility to provide exactly what
                          you need (as I
                          understand it) under this method:
                          /**
                            * Opens a file for reading,
                          returning a {@code BufferedReader} to
                          read text
                            * from the file in an efficient
                          manner...
                              ...
                            */
                    java.nio.file.Files#newBufferedReader(java.nio.file.Path)
                          It can return _anything_ as long as it
                          is a BufferedReader. We can do it, but it
                          needs to be investigated not only for
                          your favorite OS but for other OSes as
                          well. Feel free to prototype this and
                          we can discuss it on the list later.
                          Thanks,
                          -Pavel

                              On 21 Oct 2016, at 18:56, Brunoais
                              <brunoa...@gmail.com>
                                wrote:
                              Pavel is right.
                              In reality, I was expecting such
                              BufferedReader to use only a
                              single buffer and have that Buffer
                              being filled asynchronously, not
                              in a different Thread.
                              Additionally, I don't have the
                              intention of having a larger
                              buffer than before unless stated
                              through the API (the constructor).
                              In my idea, internally, it is
                              supposed to use
                    java.nio.channels.AsynchronousFileChannel
                              or equivalent.
                              It does not prevent having two
                              buffers and I do not intent to
                              change BufferedReader itself. I'd
                              do an BufferedAsyncReader of sorts
                              (any name suggestion is welcome as
                              I'm an awful namer).
                              On 21/10/2016 18:38, Roger Riggs
                              wrote:

                                  Hi Pavel,
                                  I think Brunoais asking for a
                                  double buffering scheme in
                                  which the implementation of
                                  BufferReader fills (a second
                                  buffer) in parallel with the
                                  application reading from the
                                  1st buffer
                                  and managing the swaps and
                                  async reads transparently.
                                  It would not change the API
                                  but would change the
                                  interactions between the
                                  buffered reader
                                  and the underlying stream.  It
                                  would also increase memory
                                  requirements and processing
                                  by introducing or using a
                                  separate thread and the
                                  necessary synchronization.
                                  Though I think the formal
                                  interface semantics could be
                                  maintained, I have doubts
                                  about compatibility and its
                                  unintended consequences on
                                  existing subclasses,
                                  applications and libraries.
                                  $.02, Roger
                                  On 10/21/16 1:22 PM, Pavel
                                  Rappo wrote:

                                      Off the top of my head, I
                                      would say it's not
                                      possible to change the
                                      design of an
                    _extensible_ type that has
                                      been out there for 20 or
                                      so years. All these I/O
                                      streams from java.io
                    <http://java.io>
                                      <http://java.io> were
                                      designed for simple
                    synchronous use case.
                                      It's not that their design
                                      is flawed in some way,
                                      it's that they doesn't seem to
                                      suit your needs. Have you
                    considered using
                    java.nio.channels.AsynchronousFileChannel
                                      in your applications?
                                      -Pavel

                                          On 21 Oct 2016, at
                    17:08, Brunoais
                                          <brunoa...@gmail.com>
                    wrote:
                                          Any feedback on this?
                                          I'm really interested
                                          in implementing such
                    BufferedReader/BufferedStreamReader
                                          to allow speeding up
                                          my applications
                    without having to
                    think in an
                    asynchronous way or
                    multi-threading while
                    programming with it.
                    That's why I'm asking
                                          this here.
                                          On 13/10/2016 14:45,
                    Brunoais wrote:

                    Hi,
                    I looked at
                    BufferedReader
                    source code for
                    java 9 long with
                    the source code of
                    the
                    channels/streams
                    used. I noticed
                    that, like in java
                    7, BufferedReader
                    does not use an
                    Async API to load
                    data from files,
                    instead, the data
                    loading is all
                    done synchronously
                    even when the OS
                    allows requesting
                    a file to be read
                    and getting a
                    warning later when
                    the file is
                    effectively read.
                    Why Is
                    BufferedReader not
                    async while
                    providing a sync API?

                    <BufferedNonBlockStream.java><Tests.java>





-- Sent from my phone




-- Sent from my phone







--
Sent from my phone

Reply via email to