Re: RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Ivan Gerasimov Thu, 25 Jul 2013 16:57:49 -0700

Hi Roger!

On 25.07.2013 17:42, roger riggs wrote:

Hi Ivan,


Thank you for your diligence.


Thank you for your patience :)

1) Should the test use an alternate mechanism to read the file(FileInputStream)
and confirm the length of the array?

This file can change from read to read. But it should not be empty, andthat's what we check.

I moved the test from a separate file to BytesAndLines.java.

I've also added testing of how Files.readAllLines() reads from procfsfiles (it already does well.)

2) There is an edge case where the file size is between MAX_ARRAY_SIZE
and Integer.MAX_VALUE that should be avoided.


Yes, you're right.

I rewrote the logic in a simpler way with no overflowing and casting tolong.


Would you please help review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/4/webrev/

Sincerely yours,
Ivan

The lines at L3022 are confusing.
If the max array size is MAX_BUFFER_SIZE then it should be used
also at L3063 in readAllBytes.

L3015-3026: The logic around the 3 nested if's is a bit confusing forthe actions being taken.

By switching to a long for newCapacity this can be easier to read.

   if (capacity == MAX_INTEGER) {
        throw ...
   }
   long newCapacity = ((long)capacity) << 1;
   newCapacity = Math.max(newCapacity, BUFFER_SIZE);       // at least 
BUFFER_SIZE
   newCapacity = Math.min(newCapacity, MAX_BUFFER_SIZE);   // at most 
MAX_BUFFER_SIZE
   capacity = (int) newCapacity;

    buf = Arrays.copyOf(buf, capacity);
   buf[nread++] = (byte)n;
   rem = buf.length - nread;
Thanks, Roger

BTW, I'm not an official reviewer

On 7/24/2013 7:47 PM, Ivan Gerasimov wrote:

Hello everybody!

Would you please review an updated webrev?

http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/<http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>


Thanks in advance,
Ivan


On 24.07.2013 23:36, Martin Buchholz wrote:

Use javadoc style even in private methods.
s/Read/Reads/
Use @param initialSize
      /**
+     * Read all the bytes from an input stream. The {@code initialSize}
+     * parameter indicates the initial size of the byte[] to allocate.
+     */
---

This needs to be tested for an ordinary zero-length file. It lookslike for the zero case

+            int newCapacity = capacity << 1;
will infloop not actually increasing the buffer.

---

You might want to copy this technique from ArrayList et al:


    /**
     * The maximum size of array to allocate.
     * Some VMs reserve some header words in an array.
     * Attempts to allocate larger arrays may result in
     * OutOfMemoryError: Requested array size exceeds VM limit
     */
    private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov<[email protected] <mailto:[email protected]>> wrote:


    Would you please take a look at the updated webrev?
    http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
    <http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>

    readAllBytes() was recently (in b93) changed by Alan Bateman to
    fix 8014928.

    Here's what I've done:
    - reverted readAllBytes() to its implementation prior b93.
    - modified it to address both 8020669 and 8014928.

    http://bugs.sun.com/view_bug.do?bug_id=8020669
    <http://bugs.sun.com/view_bug.do?bug_id=8014928>
    http://bugs.sun.com/view_bug.do?bug_id=8014928

    Sincerely yours,
    Ivan


    On 23.07.2013 18:09, David M. Lloyd wrote:

        Here's how it should behave:

        - allocate 'size' byte byte array
        - if size > 0:
          - use simple old I/O to read into the array
        - do a one-byte read(), if not EOF then expand the array,
        using a simple growth pattern like 3/2 (with a special case
        for 0), and continue reading until EOF
        - if the array matches the size of the file, return the
        array, else use copyOf() to shrink it

        This way you only ever copy the array size() was wrong.

        On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:

            Hi Roger!

            This is how my implementation behaves:
            - allocate 'size' bytes in BAOS
            - allocate 8k for temp buffer
            - in cycle read 8k or less bytes from input stream and
            copy them into BAOS
            - if capacity of BAOS isn't sufficient (file had grown),
            its buffer will
            be reallocated
            Thus, 1 reallocation and 1 copying of already read data
            on each 8k piece
            of additional bytes.

            In normal case, i.e. when fc.size() is correct, we have
            overhead of 1
            allocation and copying 'size' bytes in size/8k iterations.

            And this is how current implementation does
            - allocate 'size' bytes
            - allocate 'size' bytes of native memory for temp buffer
            in IOUtil.read()
            - read the whole file into temp buffer
            - copy the temp buffer back into our buffer

            In common when fc.size() is right, we have 1 allocation
            and copying
            'size' bytes from temp buffer back.

            So there is a difference in allocations/copying, but in
            my opinion it's
            not that significant for this particular task.

            Sincerely yours,
            Ivan

            On 22.07.2013 20:03, roger riggs wrote:

                Hi Ivan,

                I'm concerned about the change in behavior for the
                existing working
                cases.

                How many times are the bytes copied in your proposed
                implementation?
                How many arrays are allocated and discarded?
                Files.copy()  uses an extra array for the copies.

                BAOS should only be used for size == 0; that would
                address the issue
                without changing the current behavior or allocations.

                Roger




                On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:

                    Roger, David thanks for suggestions!

                    Would you please take a look at an updated webrev?
                    http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
                    <http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>

                    - File size is used as an initial size of BAOS's
                    buffer.
                    - BAOS avoids copying its buffer in
                    toByteArray(), if size is correct .

                    I don't want to initialize BAOS with a positive
                    number if size
                    happened to be zero.
                    Because zero could indicate that the file is
                    really empty.

                    Sincerely yours,
                    Ivan

                    On 19.07.2013 22:30, David M. Lloyd wrote:

                        My mistake, we're not talking about strings.
                         Still you can subclass
                        and determine whether the buffer size guess
                        was right, and if so
                        return the array as-is (swap in an empty
                        array or something as needed).

                        On 07/19/2013 01:28 PM, David M. Lloyd wrote:

                            It's trivial to subclass
                            ByteArrayOutputStream and add a method which
                            converts its contents to a string using
                            the two protected fields which
                            give you all the info you need to do so.
                             So no extra copy is needed
                            that you aren't already doing.

                            On 07/19/2013 01:06 PM, roger riggs wrote:

                                Hi Ivan,

                                I think this change takes too big a
                                hit for the cases where the
                                size is
                                correct.

                                No real file system can be wrong
                                about the size of a file so this
                                is a
                                problem
                                only for special file systems. If
                                the problem is with size
                                reporting zero
                                then maybe using the incremental
                                read only for size == would be a
                                better
                                fix.

                                At least you could pass the size to
                                the constructor for BAOS and
                                avoid
                                the thrashing for every re-size; but
                                still it will allocate and
                                create
                                an extra copy
                                of the every time.

                                $.02, Roger


                                On 7/19/2013 1:15 PM, Ivan Gerasimov
                                wrote:

                                    Hello everybody!

                                    Would you please review a fix
                                    for the problem with
                                    j.n.f.Files.readAllBytes() function?
                                    The current implementation
                                    relies on FileChannel.size() to
                                    preallocate
                                    a buffer for the whole file's
                                    content.
                                    However, some special
                                    filesystems can report a wrong size.
                                    An example is procfs under
                                    Linux, which reports many files
                                    under
                                    /proc
                                    to be zero sized.

                                    Thus it is proposed not to rely
                                    on the size() and instead
                                    continuously
                                    read until EOF.

                                    The downside is reallocation and
                                    copying file content between
                                    buffers.
                                    But taking into account that the
                                    doc says: "It is not intended for
                                    reading in large files." it
                                    should not be a big problem.

                                    
http://bugs.sun.com/view_bug.do?bug_id=8020669
                                    
http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
                                    
<http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>

                                    The fix is for JDK8. If it is
                                    approved, it can be applied to
                                    JDK7 as
                                    well.

                                    Sincerely yours,
                                    Ivan Gerasimov

Re: RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Reply via email to