Re: RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Ivan Gerasimov Wed, 24 Jul 2013 15:20:07 -0700

Hello Martin!

On 24.07.2013 23:36, Martin Buchholz wrote:

Use javadoc style even in private methods.
s/Read/Reads/
Use @param initialSize
      /**
+     * Read all the bytes from an input stream. The {@code initialSize}
+     * parameter indicates the initial size of the byte[] to allocate.
+     */
---

Thanks, I'll correct that.

This needs to be tested for an ordinary zero-length file. It lookslike for the zero case
+            int newCapacity = capacity << 1;
will infloop not actually increasing the buffer.

---

Well, no.

First, reading empty file wouldn't go that far. The loop would break inthe preceding line.

if (n < 0 || source.read(b) < 0)
    break;

And second, there is a check below for too small values of newCapacity:
} else if (newCapacity < BUFFER_SIZE) {
    newCapacity = BUFFER_SIZE;
}


You might want to copy this technique from ArrayList et al:


    /**
     * The maximum size of array to allocate.
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

Thanks, I'll do that too.

One thing I was thinking about is that in a bad case, when we needseveral iterations (thus reallocations) to read the file, we would copythe starting bytes of the array over and over.Wouldn't it be better to keep a list of buffers and concatenate themonly once at the end?

What do you think, is it worth implementing that?

Sincerely yours,
Ivan

On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov<[email protected] <mailto:[email protected]>> wrote:


    Would you please take a look at the updated webrev?
    http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
    <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>

    readAllBytes() was recently (in b93) changed by Alan Bateman to
    fix 8014928.

    Here's what I've done:
    - reverted readAllBytes() to its implementation prior b93.
    - modified it to address both 8020669 and 8014928.

    http://bugs.sun.com/view_bug.do?bug_id=8020669
    <http://bugs.sun.com/view_bug.do?bug_id=8014928>
    http://bugs.sun.com/view_bug.do?bug_id=8014928

    Sincerely yours,
    Ivan


    On 23.07.2013 18:09, David M. Lloyd wrote:

        Here's how it should behave:

        - allocate 'size' byte byte array
        - if size > 0:
          - use simple old I/O to read into the array
        - do a one-byte read(), if not EOF then expand the array,
        using a simple growth pattern like 3/2 (with a special case
        for 0), and continue reading until EOF
        - if the array matches the size of the file, return the array,
        else use copyOf() to shrink it

        This way you only ever copy the array size() was wrong.

        On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:

            Hi Roger!

            This is how my implementation behaves:
            - allocate 'size' bytes in BAOS
            - allocate 8k for temp buffer
            - in cycle read 8k or less bytes from input stream and
            copy them into BAOS
            - if capacity of BAOS isn't sufficient (file had grown),
            its buffer will
            be reallocated
            Thus, 1 reallocation and 1 copying of already read data on
            each 8k piece
            of additional bytes.

            In normal case, i.e. when fc.size() is correct, we have
            overhead of 1
            allocation and copying 'size' bytes in size/8k iterations.

            And this is how current implementation does
            - allocate 'size' bytes
            - allocate 'size' bytes of native memory for temp buffer
            in IOUtil.read()
            - read the whole file into temp buffer
            - copy the temp buffer back into our buffer

            In common when fc.size() is right, we have 1 allocation
            and copying
            'size' bytes from temp buffer back.

            So there is a difference in allocations/copying, but in my
            opinion it's
            not that significant for this particular task.

            Sincerely yours,
            Ivan

            On 22.07.2013 20:03, roger riggs wrote:

                Hi Ivan,

                I'm concerned about the change in behavior for the
                existing working
                cases.

                How many times are the bytes copied in your proposed
                implementation?
                How many arrays are allocated and discarded?
                Files.copy()  uses an extra array for the copies.

                BAOS should only be used for size == 0; that would
                address the issue
                without changing the current behavior or allocations.

                Roger




                On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:

                    Roger, David thanks for suggestions!

                    Would you please take a look at an updated webrev?
                    http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
                    <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>

                    - File size is used as an initial size of BAOS's
                    buffer.
                    - BAOS avoids copying its buffer in toByteArray(),
                    if size is correct .

                    I don't want to initialize BAOS with a positive
                    number if size
                    happened to be zero.
                    Because zero could indicate that the file is
                    really empty.

                    Sincerely yours,
                    Ivan

                    On 19.07.2013 22:30, David M. Lloyd wrote:

                        My mistake, we're not talking about strings.
                         Still you can subclass
                        and determine whether the buffer size guess
                        was right, and if so
                        return the array as-is (swap in an empty array
                        or something as needed).

                        On 07/19/2013 01:28 PM, David M. Lloyd wrote:

                            It's trivial to subclass
                            ByteArrayOutputStream and add a method which
                            converts its contents to a string using
                            the two protected fields which
                            give you all the info you need to do so.
                             So no extra copy is needed
                            that you aren't already doing.

                            On 07/19/2013 01:06 PM, roger riggs wrote:

                                Hi Ivan,

                                I think this change takes too big a
                                hit for the cases where the
                                size is
                                correct.

                                No real file system can be wrong about
                                the size of a file so this
                                is a
                                problem
                                only for special file systems. If the
                                problem is with size
                                reporting zero
                                then maybe using the incremental read
                                only for size == would be a
                                better
                                fix.

                                At least you could pass the size to
                                the constructor for BAOS and
                                avoid
                                the thrashing for every re-size; but
                                still it will allocate and
                                create
                                an extra copy
                                of the every time.

                                $.02, Roger


                                On 7/19/2013 1:15 PM, Ivan Gerasimov
                                wrote:

                                    Hello everybody!

                                    Would you please review a fix for
                                    the problem with
                                    j.n.f.Files.readAllBytes() function?
                                    The current implementation relies
                                    on FileChannel.size() to
                                    preallocate
                                    a buffer for the whole file's content.
                                    However, some special filesystems
                                    can report a wrong size.
                                    An example is procfs under Linux,
                                    which reports many files under
                                    /proc
                                    to be zero sized.

                                    Thus it is proposed not to rely on
                                    the size() and instead
                                    continuously
                                    read until EOF.

                                    The downside is reallocation and
                                    copying file content between
                                    buffers.
                                    But taking into account that the
                                    doc says: "It is not intended for
                                    reading in large files." it should
                                    not be a big problem.

                                    
http://bugs.sun.com/view_bug.do?bug_id=8020669
                                    
http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
                                    
<http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>

                                    The fix is for JDK8. If it is
                                    approved, it can be applied to JDK7 as
                                    well.

                                    Sincerely yours,
                                    Ivan Gerasimov

Re: RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Reply via email to