[
https://issues.apache.org/jira/browse/DERBY-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kristian Waagan updated DERBY-3769:
-----------------------------------
Attachment: derby-3769-2a-clob_buffer_size_adjustment.diff
Patch 2a adjusts the maximum return size in characters for the CLOB stored
procedure to 10890 (DB2_VARCHAR_MAXWIDTH / 3). This potentially results in
anything from 10890 to 10890*3 bytes to be returned to the client in one
round-trip, depending on the bytes per char ratio (determined by the modified
UTF8 encoding).
Even though this fix isn't optimal, the advantages outweigh the disadvantages
in my opinion.
I did a simple test, where I used a 32K buffer size in the client code to
retrieve a 32M chars long CLOB consisting of CJK chars (3 bytes per char).
With the fix the it took around 17 seconds, without it took almost 3400
seconds! In both cases a patch for DERBY-3825 was applied.
I also did a test with a 32MB CLOB containing ASCII characters, where I saw a
performance reduction of around 3% (test run on a LAN, performance reduction
will increase with higher latency networks).
If you want to test performance yourself, you must first apply the patch for
DERBY-3825 (2a). The problems are described under DERBY-3766.
Patch ready for review.
> Make LOBStoredProcedure on the server side smarter about the read buffer size
> -----------------------------------------------------------------------------
>
> Key: DERBY-3769
> URL: https://issues.apache.org/jira/browse/DERBY-3769
> Project: Derby
> Issue Type: Improvement
> Components: Network Server
> Affects Versions: 10.3.3.0, 10.4.1.3, 10.5.0.0
> Reporter: Kristian Waagan
> Assignee: Kristian Waagan
> Fix For: 10.4.2.1, 10.5.0.0
>
> Attachments: derby-3769-1a-buffer_size_adjustment.diff,
> derby-3769-1b-buffer_size_adjustment.diff,
> derby-3769-2a-clob_buffer_size_adjustment.diff
>
>
> Derby has a max length for VARBINARY and VARCHAR, which is 32'672 bytes or
> characters (see Limits.DB2_VARCHAR_MAXWIDTH).
> When working with LOBs represented by locators, using a read buffer larger
> than the max value causes the server to process far more data than necessary.
> Say the read buffer is 33'000 bytes, and these bytes are requested by the
> client. This requests ends up in LOBStoredProcedure.BLOBGETBYTES.
> Assume the stream position is 64'000, and this is where we want to read from.
> The following happens:
> a) BLOBGETBYTES instructs EmbedBlob to read 33'000 bytes, advancing the
> stream position to 97'000.
> b) Derby fetches/receives the 33'000 bytes, but can only send 32'672. The
> rest of the data (328 bytes) is discarded.
> c) The client receives the 32'672 bytes, recalculates the position and
> length arguments and sends another request.
> d) BLOBGETBYTES(locator, 96672, 328) is executed. EmbedBlob detects that the
> stream position has advanced too far, so it resets the stream to position
> zero and skips/reads until position 96'672 has been reached.
> e) The remaining 328 bytes are sent to the client.
> This issue deals with points b) and d), by avoiding the need to reset the
> stream.
> Points a) and e) are also problematic if a large number of bytes are going to
> be read, say hundreds of megabytes, but that's another issue.
> It is unfortunate that using 32 K (32 * 1024) as the buffer size is almost
> the worst case; 32'768 - 32'672 = 96 bytes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.