[
https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803533#action_12803533
]
Dag H. Wanvik commented on DERBY-4477:
--------------------------------------
The repro for DERBY-3646 fails because it's coded wrong: Per JDBC, the stream
should be digested before a new get* method is called, cf.
http://java.sun.com/j2se/1.5.0/docs/api/java/sql/ResultSet.html#getBinaryStream(int).
When I fix that error in the repro, both derby-4477-partial and
derby-4477-0a-prototype (with 64K limit) passes. When above the limit with
derby-4477-0a-prototype, materialization is not done, but rather the new
copyForRead method is used. This will eventually return a wrapped stream,
BinaryToRawStream which extends java.io.FilterInputStream. Strangely,
FilterInputStream does not give an error on read even after it has been closed
(by EmbedResultSet#closeCurrentStream), so that's why the repro passed with the
original derby-4477-0a-prototype, so the wrong usage in the repro is not caught.
> Selecting / projecting a column whose value is represented by a stream more
> than once fails
> -------------------------------------------------------------------------------------------
>
> Key: DERBY-4477
> URL: https://issues.apache.org/jira/browse/DERBY-4477
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.3.3.0, 10.4.2.0, 10.5.3.0
> Reporter: Kristian Waagan
> Assignee: Kristian Waagan
> Attachments: derby-4477-0a-prototype.diff, derby-4477-partial.diff,
> derby-4477-partial.stat
>
>
> Selecting / projecting a column whose value is represented as a stream more
> than once crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS
> clobTwo FROM mytable");
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream
> data types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and
> DERBY-2349 (there may be more Jiras).
> The core of the fix is cloning certain DVDs being selected/projected in
> multiple columns. There are two types of cloning:
> A) materializing clone
> B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without
> materializing it. Note that the streams I'm talking about are streams
> originating from the store.
> Testing revealed the following:
> - the cost of the checks performed to figure out if cloning is required
> seems acceptable (negligible?)
> - in some cases (A) has better performance than (B) because the raw data
> only has to be decoded once
> - stream clones are preferred when the data value is above a certain size
> for several reasons:
> * avoids potential out-of-memory errors (and in case of a server
> environment, it lowers the memory pressure)
> * avoids decoding the whole value if the JDBC streaming APIs are used to
> access only parts of the value
> * avoids decoding overall in cases where the value isn't accessed by the
> client / user
> (this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with
> all kinds of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> ----- Phase 1
> 1) No crashes or wrong results due to stream reuse when executing duplicate
> column selections (minus goal 4)
> 2) Minimal performance degradation for non-duplicate column selections
> 3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR
> BIT DATA] column selections
> ----- Phase 2
> 4) No out-of-memory exceptions during execution of duplicate column
> selections of BLOB/CLOB
> 5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch.
> Phase 2 requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I
> have decided to be liberal when setting the bug behavior facts. Depending on
> where the duplicate column selection is used, it can cause both crashes,
> wrong results and data corruption.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.