I suspect the reason this was done is because the stream is actually a
ChannelInputStream. If you don't do the 1-byte array thing, the
ChannelInputStream will construct an entire ByteBuffer to read into,
which is even worse. As it is, it'll create a wrapper ByteBuffer to
encapsulate the destination array, which is bad enough.
That's why I was saying that it would be nice if this could get a
FileInputStream, which afaik can read directly without messing around
with buffers.
On 07/24/2013 02:24 PM, Martin Buchholz wrote:
It's wasteful to create a 1-byte array to read into. Just use the
nullary read method.
http://download.java.net/jdk8/docs/api/java/io/InputStream.html#read()
On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
<ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>> wrote:
Would you please take a look at the updated webrev?
http://cr.openjdk.java.net/~__igerasim/8020669/2/webrev/
<http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/>
readAllBytes() was recently (in b93) changed by Alan Bateman to fix
8014928.
Here's what I've done:
- reverted readAllBytes() to its implementation prior b93.
- modified it to address both 8020669 and 8014928.
http://bugs.sun.com/view_bug.__do?bug_id=8020669
<http://bugs.sun.com/view_bug.do?bug_id=8020669>
<http://bugs.sun.com/view_bug.__do?bug_id=8014928
<http://bugs.sun.com/view_bug.do?bug_id=8014928>>
http://bugs.sun.com/view_bug.__do?bug_id=8014928
<http://bugs.sun.com/view_bug.do?bug_id=8014928>
Sincerely yours,
Ivan
On 23.07.2013 18:09, David M. Lloyd wrote:
Here's how it should behave:
- allocate 'size' byte byte array
- if size > 0:
- use simple old I/O to read into the array
- do a one-byte read(), if not EOF then expand the array, using
a simple growth pattern like 3/2 (with a special case for 0),
and continue reading until EOF
- if the array matches the size of the file, return the array,
else use copyOf() to shrink it
This way you only ever copy the array size() was wrong.
On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
Hi Roger!
This is how my implementation behaves:
- allocate 'size' bytes in BAOS
- allocate 8k for temp buffer
- in cycle read 8k or less bytes from input stream and copy
them into BAOS
- if capacity of BAOS isn't sufficient (file had grown), its
buffer will
be reallocated
Thus, 1 reallocation and 1 copying of already read data on
each 8k piece
of additional bytes.
In normal case, i.e. when fc.size() is correct, we have
overhead of 1
allocation and copying 'size' bytes in size/8k iterations.
And this is how current implementation does
- allocate 'size' bytes
- allocate 'size' bytes of native memory for temp buffer in
IOUtil.read()
- read the whole file into temp buffer
- copy the temp buffer back into our buffer
In common when fc.size() is right, we have 1 allocation and
copying
'size' bytes from temp buffer back.
So there is a difference in allocations/copying, but in my
opinion it's
not that significant for this particular task.
Sincerely yours,
Ivan
On 22.07.2013 20:03, roger riggs wrote:
Hi Ivan,
I'm concerned about the change in behavior for the
existing working
cases.
How many times are the bytes copied in your proposed
implementation?
How many arrays are allocated and discarded?
Files.copy() uses an extra array for the copies.
BAOS should only be used for size == 0; that would
address the issue
without changing the current behavior or allocations.
Roger
On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
Roger, David thanks for suggestions!
Would you please take a look at an updated webrev?
http://cr.openjdk.java.net/~__igerasim/8020669/1/webrev/
<http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/>
- File size is used as an initial size of BAOS's buffer.
- BAOS avoids copying its buffer in toByteArray(),
if size is correct .
I don't want to initialize BAOS with a positive
number if size
happened to be zero.
Because zero could indicate that the file is really
empty.
Sincerely yours,
Ivan
On 19.07.2013 22:30, David M. Lloyd wrote:
My mistake, we're not talking about strings.
Still you can subclass
and determine whether the buffer size guess was
right, and if so
return the array as-is (swap in an empty array
or something as needed).
On 07/19/2013 01:28 PM, David M. Lloyd wrote:
It's trivial to subclass
ByteArrayOutputStream and add a method which
converts its contents to a string using the
two protected fields which
give you all the info you need to do so. So
no extra copy is needed
that you aren't already doing.
On 07/19/2013 01:06 PM, roger riggs wrote:
Hi Ivan,
I think this change takes too big a hit
for the cases where the
size is
correct.
No real file system can be wrong about
the size of a file so this
is a
problem
only for special file systems. If the
problem is with size
reporting zero
then maybe using the incremental read
only for size == would be a
better
fix.
At least you could pass the size to the
constructor for BAOS and
avoid
the thrashing for every re-size; but
still it will allocate and
create
an extra copy
of the every time.
$.02, Roger
On 7/19/2013 1:15 PM, Ivan Gerasimov wrote:
Hello everybody!
Would you please review a fix for
the problem with
j.n.f.Files.readAllBytes() function?
The current implementation relies on
FileChannel.size() to
preallocate
a buffer for the whole file's content.
However, some special filesystems
can report a wrong size.
An example is procfs under Linux,
which reports many files under
/proc
to be zero sized.
Thus it is proposed not to rely on
the size() and instead
continuously
read until EOF.
The downside is reallocation and
copying file content between
buffers.
But taking into account that the doc
says: "It is not intended for
reading in large files." it should
not be a big problem.
http://bugs.sun.com/view_bug.__do?bug_id=8020669
<http://bugs.sun.com/view_bug.do?bug_id=8020669>
http://cr.openjdk.java.net/~__igerasim/8020669/0/webrev/
<http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/>
The fix is for JDK8. If it is
approved, it can be applied to JDK7 as
well.
Sincerely yours,
Ivan Gerasimov
--
- DML