Hello Kathey.
I answer your questions.
My previous comment was regarding the places where new code uses reverse
indentation or no indentation, not reformatting existing code. We don't
have formatting guidelines but I think code blocks should be indented.
I see.
I will try to indent them partially.
However, I am anxious about readability of patch itself.
Please evaluate readability of next patch too.
- Can you add a comment to explain the calculation for prepScalarStream return
value, especially what the 6 and 4 are?
return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize -
extendedLengthByteCount
They are hard-coded number inherited from original code.
I read them that
6 is Length of Layer A header and
4 is Length of Layer B header.
Please see DRDA, Version 3, Volume 3: Distributed Data Management (DDM)
Architecture page 315
Could you verify that my analysis of this code is correct?
I think not correct.
"flushScalarStreamSegment" is called *only* when DSS segment is full (or
no more information remains in stream).
- Why do we need this ensureLength call for layerBstreaming in prepScalarStream?
ensureLength( layerBstreaming ?
DEFAULT_BUFFER_SIZE - offset :
length );
As written before, "flushScalarStreamSegment" is called *only* when DSS
segment is full (or no more information remains in stream).
Therefore it must be possible for buffer to contain whole DSS segment,
so that running out of buffer happens only after running out of DSS segment.
Then, flushScalarStreamSegment will be called correctly before running
out of buffer.
The call to ensureLength ensure size of buffer to contain whole DSS
segment and realize above.
// To say the truth, I read original flushScalarStreamSegment as that
buffer is always large enough to contain whole DSS segment.
// When reading mail from Bryan, I thought adding ensureLength is not
necessary but just improvement for appropriate size of buffer.
// However I found that implementation of DMMWriter would cause
insufficient size of buffer *just when* I answer you ....
// Well ... Thank you ....
// // To say more verbosely , I think insufficient size of buffer may be
escaped outside DDMWriter already.
// // Otherwise original flushScalarStreamSegment would not worked ever.
Sorry, one more question.
Could you explain the error handling in the new scheme? The old code would
call padScalarStreamForError and that is now gone, but I am not sure I
understand how the error handling is handled now.
Answer is that streaming is discontinued and exception is sent using
agent.markCommunicationFailure as same as other place.
I will answer your "Possible future improvements:" in next mail ...
Now, it is almost midnight in Japan and I need to sleep :)
Best regards.
Kathey Marsden (JIRA) wrote:
[ http://issues.apache.org/jira/browse/DERBY-326?page=comments#action_12362861 ]
Kathey Marsden commented on DERBY-326:
--------------------------------------
I think this patch is a big improvement over reading the lobs into memory.
A few comments on future improvement and some questions on the code below:
Possible future improvements:
- for VARCHAR FOR BIT DATA (byte[] values) we still call writeScalarStream
with length and do not do layerBStreaming. Would it make sense to stream this
as well and get rid of all the code for handling extended length etc?
reencode() performance
This method is called every 1024 characters of a character stream and
-- creates a localBuffer char[1024] to read the character stream data
-- creates a new String(localBuffer)
-- creates a new byte[] with the getBytes() call
-- creates a new ByteArrayInputStream
A simple optimization using your current scheme might be to have global field
byte[BUFFERED_CHAR_LEN] instead of making a new one each time. Perhaps also a
single OutputStreamWriter field wrapped around a ByteArrayOutputStream which
is reset on each reencode call could eliminate the String creation.
Long term we should consider the fact that Derby stores the data in UTF8
format, getCharacterStream decodes it then network server reencodes it to UTF8.
DERBY-760 is meant to address this issue, plus hopefully eliminate the
additional copy into the buffer in DDMWriter. I will add more comments to
DERBY-760.
QUESTIONS on code:
DDMWriter - I always had a hard time understanding the stream code in this file
, but found it much easier with your single loop vs the triple loop of the old
code. But I have some questions.
- Why do we need this ensureLength call for layerBstreaming in prepScalarStream?
ensureLength( layerBstreaming ?
DEFAULT_BUFFER_SIZE - offset :
length );
- Can you add a comment to explain the calculation for prepScalarStream return
value, especially what the 6 and 4 are?
return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize -
extendedLengthByteCount
writeScalarStream
Could you verify that my analysis of this code is correct?
while( !isLastSegment ){
/*
* This loop writes and flushes the data. If there is more data
than will
* fit in the remaining buffer space or will fit in the current DSS, the
* data will be continued in the next DSS segment via flushScalarStreamSegment.
*
*/
int spareBufferLength = bytes.length - offset;
[snip more of writeScalarStream ]
// Adjust spareDssLength for 2 byte continuation flags
//written by flushScalarStreamSegment
if( ! isLastSegment )
spareDssLength = DssConstants.MAX_DSS_LENGTH - 2;
}
}
Improve streaming of large objects for network server and client
----------------------------------------------------------------
Key: DERBY-326
URL: http://issues.apache.org/jira/browse/DERBY-326
Project: Derby
Type: Improvement
Components: Network Client, Performance, Network Server
Reporter: Kathey Marsden
Assignee: Tomohito Nakayama
Attachments: DERBY-326.patch, DERBY-326_2.patch, DERBY-326_3.patch,
DERBY-326_4.patch, DERBY-326_5.patch
Currently the stream writing methods in network server and client require a
length parameter. This means that we have to get the length of the stream
before sending it. For example in network server in EXTDTAInputStream we have
to use getString and getbytes() instead of getCharacterStream and
getBinaryStream so that we can get the length.
SQLAM Level 7 provides for the enhanced LOB processing to allow streaming
without indicating the length, so, the writeScalarStream methods in
network server DDMWriter.java and network client Request.java can be changed to
not require a length.
Code inspection of these methods seems to indicate that while the length is
never written it is used heavily in generating the DSS. One strange thing is
that it appears on error, the stream is padded out to full length with zeros,
but an actual exception is never sent. Basically I think perhaps these methods
need to be rewritten from scratch based on the spec requirements for lobs.
After the writeScalarStream methods have been changed, then EXTDAInputStream
can be changed to properly stream LOBS. See TODO tags in this file for more
info. I am guessing similar optimizations available in the client as well, but
am not sure where that code is.
--
/*
Tomohito Nakayama
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
Naka
http://www5.ocn.ne.jp/~tomohito/TopPage.html
*/