Hi Roger!
On 25.07.2013 17:42, roger riggs wrote:
Hi Ivan,
Thank you for your diligence.
Thank you for your patience :)
1) Should the test use an alternate mechanism to read the file
(FileInputStream)
and confirm the length of the array?
This file can change from read to read. But it should not be empty, and
that's what we check.
I moved the test from a separate file to BytesAndLines.java.
I've also added testing of how Files.readAllLines() reads from procfs
files (it already does well.)
2) There is an edge case where the file size is between MAX_ARRAY_SIZE
and Integer.MAX_VALUE that should be avoided.
Yes, you're right.
I rewrote the logic in a simpler way with no overflowing and casting to
long.
Would you please help review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/4/webrev/
Sincerely yours,
Ivan
The lines at L3022 are confusing.
If the max array size is MAX_BUFFER_SIZE then it should be used
also at L3063 in readAllBytes.
L3015-3026: The logic around the 3 nested if's is a bit confusing for
the actions being taken.
By switching to a long for newCapacity this can be easier to read.
if (capacity == MAX_INTEGER) {
throw ...
}
long newCapacity = ((long)capacity) << 1;
newCapacity = Math.max(newCapacity, BUFFER_SIZE); // at least
BUFFER_SIZE
newCapacity = Math.min(newCapacity, MAX_BUFFER_SIZE); // at most
MAX_BUFFER_SIZE
capacity = (int) newCapacity;
buf = Arrays.copyOf(buf, capacity);
buf[nread++] = (byte)n;
rem = buf.length - nread;
Thanks, Roger
BTW, I'm not an official reviewer
On 7/24/2013 7:47 PM, Ivan Gerasimov wrote:
Hello everybody!
Would you please review an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/3/webrev/
<http://cr.openjdk.java.net/%7Eigerasim/8020669/3/webrev/>
Thanks in advance,
Ivan
On 24.07.2013 23:36, Martin Buchholz wrote:
Use javadoc style even in private methods.
s/Read/Reads/
Use @param initialSize
/**
+ * Read all the bytes from an input stream. The {@code initialSize}
+ * parameter indicates the initial size of the byte[] to allocate.
+ */
---
This needs to be tested for an ordinary zero-length file. It looks
like for the zero case
+ int newCapacity = capacity << 1;
will infloop not actually increasing the buffer.
---
You might want to copy this technique from ArrayList et al:
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
On Tue, Jul 23, 2013 at 5:09 PM, Ivan Gerasimov
<ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>> wrote:
Would you please take a look at the updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/2/webrev/
<http://cr.openjdk.java.net/%7Eigerasim/8020669/2/webrev/>
readAllBytes() was recently (in b93) changed by Alan Bateman to
fix 8014928.
Here's what I've done:
- reverted readAllBytes() to its implementation prior b93.
- modified it to address both 8020669 and 8014928.
http://bugs.sun.com/view_bug.do?bug_id=8020669
<http://bugs.sun.com/view_bug.do?bug_id=8014928>
http://bugs.sun.com/view_bug.do?bug_id=8014928
Sincerely yours,
Ivan
On 23.07.2013 18:09, David M. Lloyd wrote:
Here's how it should behave:
- allocate 'size' byte byte array
- if size > 0:
- use simple old I/O to read into the array
- do a one-byte read(), if not EOF then expand the array,
using a simple growth pattern like 3/2 (with a special case
for 0), and continue reading until EOF
- if the array matches the size of the file, return the
array, else use copyOf() to shrink it
This way you only ever copy the array size() was wrong.
On 07/23/2013 05:06 AM, Ivan Gerasimov wrote:
Hi Roger!
This is how my implementation behaves:
- allocate 'size' bytes in BAOS
- allocate 8k for temp buffer
- in cycle read 8k or less bytes from input stream and
copy them into BAOS
- if capacity of BAOS isn't sufficient (file had grown),
its buffer will
be reallocated
Thus, 1 reallocation and 1 copying of already read data
on each 8k piece
of additional bytes.
In normal case, i.e. when fc.size() is correct, we have
overhead of 1
allocation and copying 'size' bytes in size/8k iterations.
And this is how current implementation does
- allocate 'size' bytes
- allocate 'size' bytes of native memory for temp buffer
in IOUtil.read()
- read the whole file into temp buffer
- copy the temp buffer back into our buffer
In common when fc.size() is right, we have 1 allocation
and copying
'size' bytes from temp buffer back.
So there is a difference in allocations/copying, but in
my opinion it's
not that significant for this particular task.
Sincerely yours,
Ivan
On 22.07.2013 20:03, roger riggs wrote:
Hi Ivan,
I'm concerned about the change in behavior for the
existing working
cases.
How many times are the bytes copied in your proposed
implementation?
How many arrays are allocated and discarded?
Files.copy() uses an extra array for the copies.
BAOS should only be used for size == 0; that would
address the issue
without changing the current behavior or allocations.
Roger
On 7/20/2013 6:15 PM, Ivan Gerasimov wrote:
Roger, David thanks for suggestions!
Would you please take a look at an updated webrev?
http://cr.openjdk.java.net/~igerasim/8020669/1/webrev/
<http://cr.openjdk.java.net/%7Eigerasim/8020669/1/webrev/>
- File size is used as an initial size of BAOS's
buffer.
- BAOS avoids copying its buffer in
toByteArray(), if size is correct .
I don't want to initialize BAOS with a positive
number if size
happened to be zero.
Because zero could indicate that the file is
really empty.
Sincerely yours,
Ivan
On 19.07.2013 22:30, David M. Lloyd wrote:
My mistake, we're not talking about strings.
Still you can subclass
and determine whether the buffer size guess
was right, and if so
return the array as-is (swap in an empty
array or something as needed).
On 07/19/2013 01:28 PM, David M. Lloyd wrote:
It's trivial to subclass
ByteArrayOutputStream and add a method which
converts its contents to a string using
the two protected fields which
give you all the info you need to do so.
So no extra copy is needed
that you aren't already doing.
On 07/19/2013 01:06 PM, roger riggs wrote:
Hi Ivan,
I think this change takes too big a
hit for the cases where the
size is
correct.
No real file system can be wrong
about the size of a file so this
is a
problem
only for special file systems. If
the problem is with size
reporting zero
then maybe using the incremental
read only for size == would be a
better
fix.
At least you could pass the size to
the constructor for BAOS and
avoid
the thrashing for every re-size; but
still it will allocate and
create
an extra copy
of the every time.
$.02, Roger
On 7/19/2013 1:15 PM, Ivan Gerasimov
wrote:
Hello everybody!
Would you please review a fix
for the problem with
j.n.f.Files.readAllBytes() function?
The current implementation
relies on FileChannel.size() to
preallocate
a buffer for the whole file's
content.
However, some special
filesystems can report a wrong size.
An example is procfs under
Linux, which reports many files
under
/proc
to be zero sized.
Thus it is proposed not to rely
on the size() and instead
continuously
read until EOF.
The downside is reallocation and
copying file content between
buffers.
But taking into account that the
doc says: "It is not intended for
reading in large files." it
should not be a big problem.
http://bugs.sun.com/view_bug.do?bug_id=8020669
http://cr.openjdk.java.net/~igerasim/8020669/0/webrev/
<http://cr.openjdk.java.net/%7Eigerasim/8020669/0/webrev/>
The fix is for JDK8. If it is
approved, it can be applied to
JDK7 as
well.
Sincerely yours,
Ivan Gerasimov