Daniel John Debrunner wrote:
Kristian Waagan wrote:
Hello,

I just discovered that we are having problems with the length less
overloads in the embedded driver. Before I add any Jiras, I would like
some feedback from the community. There are for sure problems in
SQLBinary.readFromStream(). I would also appreciate if someone with
knowledge of the storage layer can tell me if we are facing trouble
there as well.

SQL layer
=========
SQLBinary.readFromStream()
  1) The method does not support streaming.
     It will either grow the buffer array to twice its size, or possibly
     more if the available() method of the input stream returns a
     non-zero value, until all data is read. This approach causes an
     OutOfMemoryError if the stream data cannot fit into memory.

I think this is because the maximum size for this data type is 255
bytes, so memory usage was not a concern.
SQLBinary corresponds to CHAR FOR BIT DATA, the sub-classes correspond
to the larger data types.

One question that has been nagging me is that the standard response to
why the existing JDBC methods had to declare the length was that the
length was required up-front by most (some?) database engines. Did this
requirement suddenly disappear? I assume it was discussed in the JDBC
4.0 expert group.

I haven't looked at your implementation for this, but the root cause may
be that derby does need to verify that the supplied value does not
exceed the declared length for the data type. Prior to any change for
lengthless overloads the incoming length was checked before the data was
inserted into the store. I wonder if with your change it is still
checking the length prior to storing it, but reading the entire value
into a byte array in order to determine its length.


Hi Dan,

That's true, and this is the approach I'm going for on the client until
we can do something better. On the embedded side, this approach is not
good enough.
I plan to handle the length check by wrapping the application stream in
a LimitReader, as is done for other streams. The length is also checked elsewhere when the data is inserted.


  2) Might enter endless loop.
     If the available() method of the input stream returns 0, and the
     data in the stream is larger than the initial buffer array, an
     endless loop will be entered. The problem is that the length
     argument of read(byte[],int,int) is set to 0. We don't read any
     more data and the stream is never exhausted.

That seems like a bug, available() is basically a useless method.

Added DERBY-1510 for this. The data going through here should be limited to 32700 bytes. Still need to avoid the possibility of a hang though, and I plan to remove the use of 'InputStream.available()'.


To me, relying on available() to determine if the stream is exhausted
seems wrong. Also, subclasses of InputStream will return 0 if they don't
override the method.
I wrote a simple workaround for 2), but then the OutOfMemoryError
comes into play for large data.


Store layer
===========
I haven't had time to study the store layer, and know very little about
it. I hope somebody can give me some quick answers here.
  3) Is it possible to stream directly to the store layer if you don't
     know the length of the data?
     Can meta information (page headers, record headers etc.) be updated
     "as we go", or must the size be specified when the insert is
     started?

Yes the store can handle this.

Good to hear. I might play around a little with this. If people have
thoughts about how to solve this, or know of problems we will run into,
please share :)



BTW: I did a little hacking... By changing one int variable, I was able
to use the length less overloads (with Runtime.maxMemory = 63 MB, applied patch DERBY-1417). I was also able to
read back the Blobs and the contents were the same as I inserted.
Need to check this out some more, and figure out how to fit the change
into the API. Basically, a negative length argument is passed down to
the store layer, and the length check at the higher level is disabled.

When using an ineffective stream (generated 1 byte at a time, a looping-alphabet stream), I got these times (single run):

Size | Length specified | Length less |
 10M |           14,8 s |      14,0 s |
100M |           49,3 s |      49,2 s |
  2G |          874,1 s |     979,1 s |

The blobs were inserted, then read back and compared with a new instance of the stream used for the blob (ie for the 2G blob, a total of 6G was generated/read). The database directory clocked in at 4,2G (du -h)



--
Kristian


Dan.



Reply via email to