Hello,

A great deal of work has been going on for LOBs (large objects; BLOB and CLOB) since 10.3. I'll try to summarize some of it.

First of all, the situation changed radically between 10.2 and 10.3. This was when Derby started using locators for LOBs. In short, it means that instead of transferring the LOB to the client, the LOB is handled on the server and the client sends commands to the server. If the client requests data (a part of or the whole content), the data is transferred from the server to the client.

Below are some averaged (15 runs) results from ClobAccessTest, all run with the embedded driver;
Test name                                  10.3.3.0   10.4.2.0    740513
-------------------------------------------------------------------------
// The numbers denote throughput (16 clients, selecting small Clobs).
testConcurrency                              ~20000        n/a    ~30000
// Numbers are durations of the tests, in milliseconds.
testFetchLargeClobOneByOneChar                 8159       7437      4521
testFetchLargeClobOneByOneCharBaseline         3771       3562      3825
testFetchLargeClobOneByOneCharModified         8877       8477      6186
testFetchLargeClobPieceByPiece               673707     624639      3370
testFetchLargeClobPieceByPieceBackwards     1138559    1059045      2863
testFetchLargeClobPieceByPieceModified       504400     454054      4520
testFetchLargeClobWithStream                   3162       2900      3181
testFetchLargeClobs                           37521      35350     37424
testFetchLargeClobsModified                   60289      56441     59283
testFetchSmallClobs                           21617       5364      6121
testFetchSmallClobsInaccurateLength           20810       4412      4466
testLargeClobGetLength                        49944      62602        46
testModifySmallClobs                          32218      16077     16286


The numbers are durations (in milliseconds) of the test methods. The source data is 15000 small Clobs consisting of 1 to 5 characters, and 10 large Clobs each with 15M characters. Since the source data is the modern latin alphabet, this equals 15 MB. The execution is single threaded, with the exception of the concurrency test. To fully understand the number, you have to look at the tests and how the test framework is reporting the duration.

I think the test framework has some flaws in the way it is running the tests, i.e. they are always being run in the same order. The fact that there are so many permutations to try (cache size, large Clobs, small Clobs, modified vs non-modified, encrypted vs non-encrypted, embedded vs client, method arguments, number of access methods, data value; 1, 2, or 3 bytes per character, etc) complicates the picture. Also, Derby and/or Java themselves contribute with some instability in the results.

As you can see, the performance of most operations tested has been improved, some of the significantly. The most interesting result is the one for 'testFetchLargeClobPieceByPiece', because it represents the code path taken when a Clob is accessed through the client driver. On the other hand, the performance boost is a lot smaller for smaller Clobs. We have had some reports, where users have reported problems with larger Clobs ( > 5 MB).

I'm not sure if the increased times for trunk versus 10.4 is significant or not. It is not unreasonable that the following capabilities have added a little to the overhead;
 o the repositioning functionality of UTF8Reader, which in turn is based
   on PositionedStoreStream.
 o the fact that streams are able to detect that the underlying data has
   changed.
 o the introduction of two header formats (for small Clobs, 3 bytes
   extra per Clob and required check)

Do you think the numbers look acceptable?
Is there anything we should investigate further?
Are there any specific performance problems that are still unresolved?

When it comes to Blobs, operations should be faster than for Clobs because Derby doesn't have to decode the data (modified UTF-8).

I haven't looked much at the client driver, but I know there are performance issues to be solved there. A part of them are related to having to send a message in a separate round-trip to the server to get something done, for instance closing a LOB that hasn't been accessed by the user. Some code paths also do argument checking where the length is required, and since it isn't available on the client we have to ask the server. Options here are to let the operation fail on the server, or somehow transfer the length to the client up front. Also, I don't know if the transfer mechanism is perfect, as it requires the execution of a callable statement for each chunk of data to transfer (max chunk size is around 32 KB). Implementing something new here may be time consuming.


Below I'm listing some of the issues that have been worked on (I don't think the list is complete).


Regards,
--
Kristian


DERBY-2822 Add caching of store stream length in StoreStreamClob, if appropriate DERBY-3571 LOB locators are not released if the LOB columns are not accessed by the client DERBY-3658 LOBStateTracker should not use SYSIBM.CLOBRELEASELOCATOR when the database is soft-upgraded from 10.2
DERBY-3766 EmbedBlob.setPosition is highly ineffective for streams
DERBY-3768 Make EmbedBlob.length use skip instead of read
DERBY-3769 Make LOBStoredProcedure on the server side smarter about the read buffer size
DERBY-3791 Excessive memory usage when fetching small Clobs
DERBY-3793 Remove unnecessary methods from InternalClob interface
DERBY-3799 NullPointerException when accessing a clob through a pooled connection DERBY-3818 client Insert/retrieval of 18MB Clob is extremely slow in MultiByteClobTest
DERBY-3825 StoreStreamClob.getReader(charPos) performs poorly
DERBY-3871 EmbedBlob.setBytes returns incorrect insertion count
DERBY-3889 LOBStreamControl.truncate() doesn't delete temporary files
DERBY-3907 Save useful length information for Clobs in store
DERBY-3934 Improve performance of reading modified Clobs
DERBY-3935 Introduce interface for a position aware stream
DERBY-3936 Add CharacterStreamDescriptor
DERBY-3970 PositionedStoreStream doesn't initialize itself properly
DERBY-3977 Clob.truncate with a value greater than the Clob length raises different exceptions in embedded and client driver DERBY-3978 Clob.truncate(long) in the client driver doesn't update the cached Clob length

Reply via email to