Re: [jira] Commented: (DERBY-326) Improve streaming of large objects for network server and client

TomohitoNakayama Tue, 17 Jan 2006 06:00:37 -0800

Hello Kathey.

I answer your questions.

My previous comment was regarding the places where new code uses reverse
indentation or no indentation, not reformatting existing code. We don't
have formatting guidelines but I think code blocks should be indented.

I see.
I will try to indent them partially.
However, I am anxious about readability of patch itself.

Please evaluate readability of next patch too.

- Can you add a comment to explain the calculation for prepScalarStream return 
value, especially what the 6 and 4 are?
return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize  - 
extendedLengthByteCount

They are hard-coded number inherited from original code.
I read them that
6 is Length of Layer A header and
4 is Length of Layer B header.

Please see DRDA, Version 3, Volume 3: Distributed Data Management (DDM)Architecture page 315

Could you verify that my analysis of this code  is correct?

I think not correct.

"flushScalarStreamSegment" is called *only* when DSS segment is full (orno more information remains in stream).

- Why do we need this ensureLength call for layerBstreaming in prepScalarStream?ensureLength( layerBstreaming ?DEFAULT_BUFFER_SIZE - offset :length );

As written before, "flushScalarStreamSegment" is called *only* when DSSsegment is full (or no more information remains in stream).Therefore it must be possible for buffer to contain whole DSS segment,so that running out of buffer happens only after running out of DSS segment.

Then, flushScalarStreamSegment will be called correctly before runningout of buffer.

The call to ensureLength ensure size of buffer to contain whole DSSsegment and realize above.

// To say the truth, I read original flushScalarStreamSegment as thatbuffer is always large enough to contain whole DSS segment.// When reading mail from Bryan, I thought adding ensureLength is notnecessary but just improvement for appropriate size of buffer.// However I found that implementation of DMMWriter would causeinsufficient size of buffer *just when* I answer you ....

// Well ... Thank you ....

// // To say more verbosely , I think insufficient size of buffer may beescaped outside DDMWriter already.

// // Otherwise original flushScalarStreamSegment would not worked ever.

Sorry, one more question.
Could you explain the error handling in the new scheme?  The old code would 
call padScalarStreamForError and that is now gone, but I am not sure I 
understand how the error handling is handled now.

Answer is that streaming is discontinued and exception is sent usingagent.markCommunicationFailure as same as other place.



I will answer your "Possible future improvements:" in next mail ...
Now, it is almost midnight in Japan and I need to sleep :)


Best regards.



Kathey Marsden (JIRA) wrote:

[ http://issues.apache.org/jira/browse/DERBY-326?page=comments#action_12362861 ]

Kathey Marsden commented on DERBY-326:
--------------------------------------

I think this patch is a big improvement over reading the lobs into memory.
A few comments on future improvement and some questions on the code below:


Possible future improvements:

- for VARCHAR FOR BIT DATA (byte[] values)  we still call writeScalarStream 
with length and do not do layerBStreaming.  Would it make sense to stream this 
as well and get rid of all the code for handling extended length etc?

reencode() performanceThis method is called every 1024 characters of a character stream and-- creates a localBuffer char[1024] to read the character stream data-- creates a new String(localBuffer)-- creates a new byte[] with the getBytes() call

-- creates a new ByteArrayInputStream

A simple optimization using your  current scheme might be to have global field  
byte[BUFFERED_CHAR_LEN] instead of making a new one  each time.  Perhaps also a 
single  OutputStreamWriter field wrapped around a ByteArrayOutputStream  which 
is reset on each reencode call could eliminate the String creation.

Long term we should consider the fact that Derby stores the data in UTF8 
format, getCharacterStream decodes it then network server reencodes it to UTF8.
DERBY-760 is meant to address this issue, plus hopefully eliminate the 
additional copy into the buffer in  DDMWriter.  I will add more comments to 
DERBY-760.



QUESTIONS on code:

DDMWriter - I always had a hard time understanding the stream code in this file 
, but found it much easier with your single loop vs the triple loop of the old 
code.  But I have some questions.

- Why do we need this ensureLength call for layerBstreaming in prepScalarStream?ensureLength( layerBstreaming ?DEFAULT_BUFFER_SIZE - offset :length );


- Can you add a comment to explain the calculation for prepScalarStream return 
value, especially what the 6 and 4 are?
return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize  - 
extendedLengthByteCount


writeScalarStream

Could you verify that my analysis of this code  is correct?
        while( !isLastSegment ){
                /*
                 * This loop writes and flushes the data. If there is more data 
than will

* fit in the remaining buffer space or will fit in the current DSS, the* data will be continued in the next DSS segment via flushScalarStreamSegment.**/

                    int spareBufferLength = bytes.length - offset;
                [snip more of writeScalarStream ]

                        

                        // Adjust spareDssLength for 2 byte continuation flags
                        //written by flushScalarStreamSegment
                        if( ! isLastSegment )
                            spareDssLength = DssConstants.MAX_DSS_LENGTH - 2;

                        }

}

Improve streaming of large objects for network server and client
----------------------------------------------------------------

        Key: DERBY-326
        URL: http://issues.apache.org/jira/browse/DERBY-326
    Project: Derby
       Type: Improvement
 Components: Network Client, Performance, Network Server
   Reporter: Kathey Marsden
   Assignee: Tomohito Nakayama
Attachments: DERBY-326.patch, DERBY-326_2.patch, DERBY-326_3.patch, 
DERBY-326_4.patch, DERBY-326_5.patch

Currently the stream writing  methods in network server and client require a  
length parameter. This means that we have to get the length of the stream 
before sending it. For example in network server in EXTDTAInputStream we have 
to use getString and getbytes() instead of getCharacterStream and 
getBinaryStream so that we can get the  length.
SQLAM Level 7 provides for the enhanced LOB processing to allow streaming 
without indicating the length, so, the writeScalarStream methods in
network server DDMWriter.java and network client Request.java can be changed to 
not require a length.
Code inspection of these methods seems to indicate that while the length is 
never written it is used heavily in generating the DSS. One strange thing is 
that it appears on error, the stream is padded out to full length with zeros, 
but an actual exception is never sent.  Basically I think perhaps these methods 
need to be rewritten from scratch based on the spec requirements for lobs.
After the writeScalarStream methods have been changed, then EXTDAInputStream 
can be changed to properly stream LOBS. See TODO tags in this file for more 
info.  I am guessing similar optimizations available in the client as well, but 
am not sure where that code is.


--
/*

       Tomohito Nakayama
       [EMAIL PROTECTED]
       [EMAIL PROTECTED]
       [EMAIL PROTECTED]

       Naka
       http://www5.ocn.ne.jp/~tomohito/TopPage.html

*/

Re: [jira] Commented: (DERBY-326) Improve streaming of large objects for network server and client

Reply via email to