Hello,
A great deal of work has been going on for LOBs (large objects; BLOB and
CLOB) since 10.3. I'll try to summarize some of it.
First of all, the situation changed radically between 10.2 and 10.3.
This was when Derby started using locators for LOBs. In short, it means
that instead of transferring the LOB to the client, the LOB is handled
on the server and the client sends commands to the server. If the client
requests data (a part of or the whole content), the data is transferred
from the server to the client.
Below are some averaged (15 runs) results from ClobAccessTest, all run
with the embedded driver;
Test name 10.3.3.0 10.4.2.0 740513
-------------------------------------------------------------------------
// The numbers denote throughput (16 clients, selecting small Clobs).
testConcurrency ~20000 n/a ~30000
// Numbers are durations of the tests, in milliseconds.
testFetchLargeClobOneByOneChar 8159 7437 4521
testFetchLargeClobOneByOneCharBaseline 3771 3562 3825
testFetchLargeClobOneByOneCharModified 8877 8477 6186
testFetchLargeClobPieceByPiece 673707 624639 3370
testFetchLargeClobPieceByPieceBackwards 1138559 1059045 2863
testFetchLargeClobPieceByPieceModified 504400 454054 4520
testFetchLargeClobWithStream 3162 2900 3181
testFetchLargeClobs 37521 35350 37424
testFetchLargeClobsModified 60289 56441 59283
testFetchSmallClobs 21617 5364 6121
testFetchSmallClobsInaccurateLength 20810 4412 4466
testLargeClobGetLength 49944 62602 46
testModifySmallClobs 32218 16077 16286
The numbers are durations (in milliseconds) of the test methods. The
source data is 15000 small Clobs consisting of 1 to 5 characters, and 10
large Clobs each with 15M characters. Since the source data is the
modern latin alphabet, this equals 15 MB. The execution is single
threaded, with the exception of the concurrency test. To fully
understand the number, you have to look at the tests and how the test
framework is reporting the duration.
I think the test framework has some flaws in the way it is running the
tests, i.e. they are always being run in the same order. The fact that
there are so many permutations to try (cache size, large Clobs, small
Clobs, modified vs non-modified, encrypted vs non-encrypted, embedded vs
client, method arguments, number of access methods, data value; 1, 2, or
3 bytes per character, etc) complicates the picture. Also, Derby and/or
Java themselves contribute with some instability in the results.
As you can see, the performance of most operations tested has been
improved, some of the significantly. The most interesting result is the
one for 'testFetchLargeClobPieceByPiece', because it represents the code
path taken when a Clob is accessed through the client driver. On the
other hand, the performance boost is a lot smaller for smaller Clobs. We
have had some reports, where users have reported problems with larger
Clobs ( > 5 MB).
I'm not sure if the increased times for trunk versus 10.4 is significant
or not. It is not unreasonable that the following capabilities have
added a little to the overhead;
o the repositioning functionality of UTF8Reader, which in turn is based
on PositionedStoreStream.
o the fact that streams are able to detect that the underlying data has
changed.
o the introduction of two header formats (for small Clobs, 3 bytes
extra per Clob and required check)
Do you think the numbers look acceptable?
Is there anything we should investigate further?
Are there any specific performance problems that are still unresolved?
When it comes to Blobs, operations should be faster than for Clobs
because Derby doesn't have to decode the data (modified UTF-8).
I haven't looked much at the client driver, but I know there are
performance issues to be solved there. A part of them are related to
having to send a message in a separate round-trip to the server to get
something done, for instance closing a LOB that hasn't been accessed by
the user. Some code paths also do argument checking where the length is
required, and since it isn't available on the client we have to ask the
server. Options here are to let the operation fail on the server, or
somehow transfer the length to the client up front.
Also, I don't know if the transfer mechanism is perfect, as it requires
the execution of a callable statement for each chunk of data to transfer
(max chunk size is around 32 KB). Implementing something new here may be
time consuming.
Below I'm listing some of the issues that have been worked on (I don't
think the list is complete).
Regards,
--
Kristian
DERBY-2822 Add caching of store stream length in StoreStreamClob, if
appropriate
DERBY-3571 LOB locators are not released if the LOB columns are not
accessed by the client
DERBY-3658 LOBStateTracker should not use SYSIBM.CLOBRELEASELOCATOR when
the database is soft-upgraded from 10.2
DERBY-3766 EmbedBlob.setPosition is highly ineffective for streams
DERBY-3768 Make EmbedBlob.length use skip instead of read
DERBY-3769 Make LOBStoredProcedure on the server side smarter about the
read buffer size
DERBY-3791 Excessive memory usage when fetching small Clobs
DERBY-3793 Remove unnecessary methods from InternalClob interface
DERBY-3799 NullPointerException when accessing a clob through a pooled
connection
DERBY-3818 client Insert/retrieval of 18MB Clob is extremely slow in
MultiByteClobTest
DERBY-3825 StoreStreamClob.getReader(charPos) performs poorly
DERBY-3871 EmbedBlob.setBytes returns incorrect insertion count
DERBY-3889 LOBStreamControl.truncate() doesn't delete temporary files
DERBY-3907 Save useful length information for Clobs in store
DERBY-3934 Improve performance of reading modified Clobs
DERBY-3935 Introduce interface for a position aware stream
DERBY-3936 Add CharacterStreamDescriptor
DERBY-3970 PositionedStoreStream doesn't initialize itself properly
DERBY-3977 Clob.truncate with a value greater than the Clob length
raises different exceptions in embedded and client driver
DERBY-3978 Clob.truncate(long) in the client driver doesn't update the
cached Clob length