Storing length information for CLOB on disk

Kristian Waagan Thu, 28 Aug 2008 02:58:57 -0700

Hello,

It is apparent that we would benefit from extending the lengthinformation stored about CLOBs in the Derby on-disk format.

Currently we set aside two bytes at the head of the data stream forstoring the byte length of the CLOB. This is not sufficient, and canonly be used for small CLOBs. For larger CLOBs we have to read throughall the data to find the length.


I see three major steps in this process:
 a) Determine which information we want to store.
 b) Determine how and when the information is obtained and stored.
 c) Specify the actual on-disk format.


Regarding a), we have a few options:
 1) Store byte length only.
 2) Store character length only.
 3) Store both byte length and character length.
 4) Could also add distribution data for 1, 2, and 3 byte representations?

From a JDBC API point of view, the character length is the mostrelevant one. Internally, Derby might benefit from knowing the bytelength as well, since it can't be easily calculated from the characterlength. The reason for this is that we encode data using the modifiedUTF-8 format, where a single character can take up from one to three bytes.

Note that there are three aspects of finding the character length fromthe raw bytes:

 - reading all the raw bytes
 - decoding the raw bytes
 - potential page cache "pollution"

The length often has to be obtained to validate user input (positionsand lengths), which means Derby has to do a lot of unnecessary work.

I'm bringing up b) because JDBC has methods for inserting data withoutspecifying the length. Throwing out some ideas:- obtain required information on the fly, update immediately afterinsert is done

 - insert data without length information, then
    * update on first subsequent request
    * use a background task to update information
    * add a "maintenance routine" for updating the information

I'm asking people to share their opinions on this matter, and I'll tryto answer any questions you might have to take the discussion forwards.



--
Kristian

Storing length information for CLOB on disk

Reply via email to