[jira] Created: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

Kristian Waagan (JIRA) Tue, 15 Dec 2009 05:15:44 -0800

Selecting / projecting a column whose value is represented by a stream more 
than once fails
-------------------------------------------------------------------------------------------


                 Key: DERBY-4477
                 URL: https://issues.apache.org/jira/browse/DERBY-4477
             Project: Derby
          Issue Type: Bug
          Components: Store
    Affects Versions: 10.5.3.0, 10.4.2.0, 10.3.3.0
            Reporter: Kristian Waagan
            Assignee: Kristian Waagan


Selecting / projecting a column whose value is represented as a stream more 
than once crashes Derby, i.e.:
ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS 
clobTwo FROM mytable");
rs.getString(1);
rs.getString(2);

After having looked at the class of bugs having to do with reuse of stream data 
types, I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and 
DERBY-2349 (there may be more Jiras).
The core of the fix is cloning certain DVDs being selected/projected in 
multiple columns. There are two types of cloning:
 A) materializing clone
 B) stream clone

(A) can be implemented already, (B) requires code to clone a stream without 
materializing it. Note that the streams I'm talking about are streams 
originating from the store.

Testing revealed the following:
 - the cost of the checks performed to figure out if cloning is required seems 
acceptable (negligible?)
 - in some cases (A) has better performance than (B) because the raw data only 
has to be decoded once
 - stream clones are preferred when the data value is above a certain size for 
several reasons:
    * avoids potential out-of-memory errors (and in case of a server 
environment, it lowers the memory pressure)
    * avoids decoding the whole value if the JDBC streaming APIs are used to 
access only parts of the value
    * avoids decoding overall in cases where the value isn't accessed by the 
client / user
       (this statement conflicts with the performance observation above)

We don't always know the size of a value, and since the fix code deals with all 
kinds of data types, it is slightly more costly to try to obtain the size.

What do people think about the following goal statement?
Goals:
----- Phase 1
 1) No crashes or wrong results due to stream reuse when executing duplicate 
column selections (minus goal 4)
 2) Minimal performance degradation for non-duplicate column selections
 3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR 
BIT DATA] column selections
----- Phase 2
 4) No out-of-memory exceptions during execution of duplicate column selections 
of BLOB/CLOB
 5) Optimize BLOB/CLOB cloning

I think phase 1 can proceed by reviewing and discussing the prototype patch. 
Phase 2 requires more discussion and work (see DERBY-3650).


A note about the bug behavior facts:
Since this issue is the underlying cause for several other reported issues, I 
have decided to be liberal when setting the bug behavior facts. Depending on 
where the duplicate column selection is used, it can cause both crashes, wrong 
results and data corruption.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails

Reply via email to