Re: RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

David M. Lloyd Wed, 24 Jul 2013 14:24:43 -0700

Yes, you're right, I read this wrong. I agree with Martin then that thenullary read method is a better choice.


On 07/24/2013 03:49 PM, roger riggs wrote:

It looks like under the covers the ChannelInputStream uses a 1-byte buffer
but caches it and re-uses it.  That's slightly better than allocating a
new one every time.


Also, I don't see much difference between the original use (before the
Alan's previous change)
of Files.newInputStream(path) vs the compound try in readAllBytes().
That might also be reverted to its previous state as well.

Roger



On 7/24/2013 4:40 PM, David M. Lloyd wrote:

I suspect the reason this was done is because the stream is actually a
ChannelInputStream.  If you don't do the 1-byte array thing, the
ChannelInputStream will construct an entire ByteBuffer to read into,
which is even worse.  As it is, it'll create a wrapper ByteBuffer to
encapsulate the destination array, which is bad enough.

That's why I was saying that it would be nice if this could get a
FileInputStream, which afaik can read directly without messing around
with buffers.

On 07/24/2013 02:24 PM, Martin Buchholz wrote:

It's wasteful to create a 1-byte array to read into.  Just use the
nullary read method.
http://download.java.net/jdk8/docs/api/java/io/InputStream.html#read()


On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
<[email protected] <mailto:[email protected]>> wrote:

    Would you please take a look at the updated webrev?
    http://cr.openjdk.java.net/~__igerasim/8020669/2/webrev/
<http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/>

    readAllBytes() was recently (in b93) changed by Alan Bateman to fix
    8014928.

    Here's what I've done:
    - reverted readAllBytes() to its implementation prior b93.
    - modified it to address both 8020669 and 8014928.

    http://bugs.sun.com/view_bug.__do?bug_id=8020669
    <http://bugs.sun.com/view_bug.do?bug_id=8020669>
    <http://bugs.sun.com/view_bug.__do?bug_id=8014928
    <http://bugs.sun.com/view_bug.do?bug_id=8014928>>
    http://bugs.sun.com/view_bug.__do?bug_id=8014928
    <http://bugs.sun.com/view_bug.do?bug_id=8014928>

    Sincerely yours,
    Ivan


    On 23.07.2013 18:09, David M. Lloyd wrote:

        Here's how it should behave:

        - allocate 'size' byte byte array
        - if size > 0:
           - use simple old I/O to read into the array
        - do a one-byte read(), if not EOF then expand the array, using
        a simple growth pattern like 3/2 (with a special case for 0),
        and continue reading until EOF
        - if the array matches the size of the file, return the array,
        else use copyOf() to shrink it

        This way you only ever copy the array size() was wrong.

        On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:

            Hi Roger!

            This is how my implementation behaves:
            - allocate 'size' bytes in BAOS
            - allocate 8k for temp buffer
            - in cycle read 8k or less bytes from input stream and copy
            them into BAOS
            - if capacity of BAOS isn't sufficient (file had grown), its
            buffer will
            be reallocated
            Thus, 1 reallocation and 1 copying of already read data on
            each 8k piece
            of additional bytes.

            In normal case, i.e. when fc.size() is correct, we have
            overhead of 1
            allocation and copying 'size' bytes in size/8k iterations.

            And this is how current implementation does
            - allocate 'size' bytes
            - allocate 'size' bytes of native memory for temp buffer in
            IOUtil.read()
            - read the whole file into temp buffer
            - copy the temp buffer back into our buffer

            In common when fc.size() is right, we have 1 allocation and
            copying
            'size' bytes from temp buffer back.

            So there is a difference in allocations/copying, but in my
            opinion it's
            not that significant for this particular task.

            Sincerely yours,
            Ivan

            On 22.07.2013 20:03, roger riggs wrote:

                Hi Ivan,

                I'm concerned about the change in behavior for the
                existing working
                cases.

                How many times are the bytes copied in your proposed
                implementation?
                How many arrays are allocated and discarded?
                Files.copy()  uses an extra array for the copies.

                BAOS should only be used for size == 0; that would
                address the issue
                without changing the current behavior or allocations.

                Roger




                On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:

                    Roger, David thanks for suggestions!

                    Would you please take a look at an updated webrev?
http://cr.openjdk.java.net/~__igerasim/8020669/1/webrev/
<http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/>

                    - File size is used as an initial size of BAOS's
buffer.
                    - BAOS avoids copying its buffer in toByteArray(),
                    if size is correct .

                    I don't want to initialize BAOS with a positive
                    number if size
                    happened to be zero.
                    Because zero could indicate that the file is really
                    empty.

                    Sincerely yours,
                    Ivan

                    On 19.07.2013 22:30, David M. Lloyd wrote:

                        My mistake, we're not talking about strings.
                          Still you can subclass
                        and determine whether the buffer size guess was
                        right, and if so
                        return the array as-is (swap in an empty array
                        or something as needed).

                        On 07/19/2013 01:28 PM, David M. Lloyd wrote:

                            It's trivial to subclass
                            ByteArrayOutputStream and add a method which
                            converts its contents to a string using the
                            two protected fields which
                            give you all the info you need to do so.  So
                            no extra copy is needed
                            that you aren't already doing.

                            On 07/19/2013 01:06 PM, roger riggs wrote:

                                Hi Ivan,

                                I think this change takes too big a hit
                                for the cases where the
                                size is
                                correct.

                                No real file system can be wrong about
                                the size of a file so this
                                is a
                                problem
                                only for special file systems. If the
                                problem is with size
                                reporting zero
                                then maybe using the incremental read
                                only for size == would be a
                                better
                                fix.

                                At least you could pass the size to the
                                constructor for BAOS and
                                avoid
                                the thrashing for every re-size; but
                                still it will allocate and
                                create
                                an extra copy
                                of the every time.

                                $.02, Roger


                                On 7/19/2013 1:15 PM, Ivan Gerasimov
wrote:

                                    Hello everybody!

                                    Would you please review a fix for
                                    the problem with
                                    j.n.f.Files.readAllBytes() function?
                                    The current implementation relies on
                                    FileChannel.size() to
                                    preallocate
                                    a buffer for the whole file's
content.
                                    However, some special filesystems
                                    can report a wrong size.
                                    An example is procfs under Linux,
                                    which reports many files under
                                    /proc
                                    to be zero sized.

                                    Thus it is proposed not to rely on
                                    the size() and instead
                                    continuously
                                    read until EOF.

                                    The downside is reallocation and
                                    copying file content between
                                    buffers.
                                    But taking into account that the doc
                                    says: "It is not intended for
                                    reading in large files." it should
                                    not be a big problem.

http://bugs.sun.com/view_bug.__do?bug_id=8020669
<http://bugs.sun.com/view_bug.do?bug_id=8020669>
http://cr.openjdk.java.net/~__igerasim/8020669/0/webrev/
<http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/>

                                    The fix is for JDK8. If it is
                                    approved, it can be applied to
JDK7 as
                                    well.

                                    Sincerely yours,
                                    Ivan Gerasimov



--
- DML

Re: RFR [8020669] java.nio.file.Files.readAllBytes() does not read any data when Files.size() is 0

Reply via email to