Would you please take a look at the updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
readAllBytes() was recently (in b93) changed by Alan Bateman to fix 8014928.
Here's what I've done:
- reverted readAllBytes() to its implementation prior b93.
- modified it to address both 8020669 and 8014928.
http://bugs.sun.com/view_bug.do?bug_id=8020669
<http://bugs.sun.com/view_bug.do?bug_id=8014928>
http://bugs.sun.com/view_bug.do?bug_id=8014928
Sincerely yours,
Ivan
On 23.07.2013 18:09, David M. Lloyd wrote:
Here's how it should behave:
- allocate 'size' byte byte array
- if size > 0:
- use simple old I/O to read into the array
- do a one-byte read(), if not EOF then expand the array, using a
simple growth pattern like 3/2 (with a special case for 0), and
continue reading until EOF
- if the array matches the size of the file, return the array, else
use copyOf() to shrink it
This way you only ever copy the array size() was wrong.
On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
Hi Roger!
This is how my implementation behaves:
- allocate 'size' bytes in BAOS
- allocate 8k for temp buffer
- in cycle read 8k or less bytes from input stream and copy them into
BAOS
- if capacity of BAOS isn't sufficient (file had grown), its buffer will
be reallocated
Thus, 1 reallocation and 1 copying of already read data on each 8k piece
of additional bytes.
In normal case, i.e. when fc.size() is correct, we have overhead of 1
allocation and copying 'size' bytes in size/8k iterations.
And this is how current implementation does
- allocate 'size' bytes
- allocate 'size' bytes of native memory for temp buffer in
IOUtil.read()
- read the whole file into temp buffer
- copy the temp buffer back into our buffer
In common when fc.size() is right, we have 1 allocation and copying
'size' bytes from temp buffer back.
So there is a difference in allocations/copying, but in my opinion it's
not that significant for this particular task.
Sincerely yours,
Ivan
On 22.07.2013 20:03, roger riggs wrote:
Hi Ivan,
I'm concerned about the change in behavior for the existing working
cases.
How many times are the bytes copied in your proposed implementation?
How many arrays are allocated and discarded?
Files.copy() uses an extra array for the copies.
BAOS should only be used for size == 0; that would address the issue
without changing the current behavior or allocations.
Roger
On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
Roger, David thanks for suggestions!
Would you please take a look at an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
- File size is used as an initial size of BAOS's buffer.
- BAOS avoids copying its buffer in toByteArray(), if size is
correct .
I don't want to initialize BAOS with a positive number if size
happened to be zero.
Because zero could indicate that the file is really empty.
Sincerely yours,
Ivan
On 19.07.2013 22:30, David M. Lloyd wrote:
My mistake, we're not talking about strings. Still you can subclass
and determine whether the buffer size guess was right, and if so
return the array as-is (swap in an empty array or something as
needed).
On 07/19/2013 01:28 PM, David M. Lloyd wrote:
It's trivial to subclass ByteArrayOutputStream and add a method
which
converts its contents to a string using the two protected fields
which
give you all the info you need to do so. So no extra copy is needed
that you aren't already doing.
On 07/19/2013 01:06 PM, roger riggs wrote:
Hi Ivan,
I think this change takes too big a hit for the cases where the
size is
correct.
No real file system can be wrong about the size of a file so this
is a
problem
only for special file systems. If the problem is with size
reporting zero
then maybe using the incremental read only for size == would be a
better
fix.
At least you could pass the size to the constructor for BAOS and
avoid
the thrashing for every re-size; but still it will allocate and
create
an extra copy
of the every time.
$.02, Roger
On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
Hello everybody!
Would you please review a fix for the problem with
j.n.f.Files.readAllBytes() function?
The current implementation relies on FileChannel.size() to
preallocate
a buffer for the whole file's content.
However, some special filesystems can report a wrong size.
An example is procfs under Linux, which reports many files under
/proc
to be zero sized.
Thus it is proposed not to rely on the size() and instead
continuously
read until EOF.
The downside is reallocation and copying file content between
buffers.
But taking into account that the doc says: "It is not intended for
reading in large files." it should not be a big problem.
http://bugs.sun.com/view_bug.do?bug_id=8020669
http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
The fix is for JDK8. If it is approved, it can be applied to
JDK7 as
well.
Sincerely yours,
Ivan Gerasimov