Thank you so much, Tiago.

On Apr 8, 2010, at 3:49 PM, Tiago Espinha <[email protected]> wrote:

Dear all,

Earlier today Kathey gave me a crash course on DRDA for my GSoC project. 
Jayaram asked Kathey for a transcript of our conversation and Kathey suggested 
that I'd send it to the list, so that other contributors could spot eventual 
mistakes. If you find any, please feel free to chip in with your knowledge.

Here goes:
---------------------8<----------------------
<kmarsden> DRDA - Distributed Relational Database Architecture.
<kmarsden> Basically it is a protocol that shuttles database requests from a 
client over the network to a server.  The call the client an Application 
Requester and the server an Application Server.
<kmarsden> The Application Server term predates what we think of as an 
Application Server and has nothing to do with it.
<etiago> ok
<kmarsden> So as we discussed, the derby client JDBC Driver (our Application 
Requester) converts JDBC calls into DRDA, sends the DRDA accross the network to 
Network Server (our Application Server) which converts them back into JDBC 
which it sends to the embedded driver.
<kmarsden> So the whole thing is a JDBC to DRDA to JDBC converter.  This way we 
meet the requirement of having multiple jvms on multiple machines accessing a 
single embedded database, because everything gets routed through the network 
server process.
<kmarsden> Make sense?
<etiago> it does
<kmarsden> DRDA is mostly associated historically with DB2 but there are 
actually a lot of licensed DRDA vendors, including Microsoft
<etiago> so let me try to establish a comparison here
<etiago> DRDA is sort of a platform-agnostic way of transferring database 
requests over the network, is this right?
<etiago> something like XML
<kmarsden> yes. That's right.   It can go over TCP/IP or something called SNA, 
but yes it is platform-agnostic. As you may guess I think it started on the 
mainframe with EBCDIC encoding.
<kmarsden> Having a standard protocol has allowed even for a single client to 
be shared amongst several databases.
<kmarsden> For example the IBM DB2 Universal JDBC Driver (JCC) used to be used 
with Derby and is still used with Informix as well as of course many flavors of 
DB2.
<kmarsden> DRDA iis also used with the DB2 C client. Not just JDBC
<etiago> I see
<kmarsden> In practice it is not as portable as you would like. It is generally 
a lot of work to get one product's DRDA client working with a different server. 
<etiago> ok, is DRDA just plaintext? I was looking at the trace and I saw that 
there's an hexadecimal, ASCII and EBCDIC representations of the data
<kmarsden> The encoding for data can be described in the protocol flow. Up 
until now the DDM commands and parameters (which we will discuss in a bit) were 
all in EBCDIC.
<etiago> ok
<kmarsden> UNICODEMGR allows them (except for the earliest) to be in UTF-8
<kmarsden> So there are three volumes to the manuals. Volume 1 DRDA describes 
the protocol flow.
<kmarsden> It shows the commands and what order they flow in from AR to AS and 
back.
<kmarsden> But to read it you need to look at Volume 2, which describes the 
commands and objects that are being flowed in detail
<kmarsden> The DDM manual shows a 2 byte "Codepoint" for each one of these 
commands or objects and that is how they are identified.
<etiago> ah, kind of counterintuitive then #:)
<kmarsden> You will see Codepoint.java class in both client and server that 
lists the codepoints in Derby
<etiago> I did look at that and all it reminded me of was pointers
<kmarsden> Vol 3 describes the format for the data types.
<etiago> so DRDA also has data types of its own?
<kmarsden> I can't say I have ever actually used Vol 3 or understood it.  That 
work was done pretty early before even my time.  I spend my time mostly in Vol 
2 and Vol 1
<etiago> oh alright
<kmarsden> Yes, it does and I think at two levels.  For example DRDA itself in 
Volume 1 will talk about Varchar and Char types but Vol 3 just talks about 
strings in general.  But we'll move on because this is not where you will be 
working.
<etiago> ok
* kasun ([email protected]) entrou em #derby
* kasun agora chama-se Guest50421
<kmarsden> Ok.  All this protocol flow and DDM objects are wrapped in something 
called a Data Stream Structure (DSS) which is just really a packet wrapper and 
has no real content except descxribing the length and chaniing one DSS to 
another.  You will see these in the traces starting with a D0 and identifes the 
packet as a request a reply or an object.
<kmarsden> I only mention that really because you will see DSS referred to all 
over the place. I don't think you will have to change that code either.
<etiago> yes, I've seen DSS stuff around
<kmarsden> Now lets look at the actual protocol.  When a connection is made by 
the AR the first thing that is sent is an EXCSAT
<kmarsden> Exchange Server Attributes Request
<etiago> yes, I see that on the trace
<kmarsden> This even with UNICODEMGR will be sent EBCDIC and is really 
important because it is where the protocol levels and now encoding are 
negotiated.
<etiago> as I understand, we keep this as EBCDIC to allow clients that don't 
support UTF-8 to ask for normal EBCDIC rather than UTF-8
<kmarsden> Right. If you look at the DDM manual which is alphabetical at 
EXCSAT, you will find out about it.,
<kmarsden>  
<etiago> ok
<kmarsden> Generally with EXCSAT you send an external name identifying the 
client, a version and a manager level list.
<kmarsden> It is the manager level list or MGRLVLLS where the protocol 
negotiation and now encoding takes place.
<kmarsden> oh no, I switched computers and don't have my spec to look at so I 
wlll go from memory.
<kmarsden> Most of the Manager levels like SQLAM for instance are just a number 
1- whatever to show the protocol level, but UNICODEMGR the one being introduced 
for ACR7007 is a bit different.
<kmarsden> For that one the client will send 0 if it wants to continue with 
EBCDIC or 1208 if it wants UTF-8.  I am not sure exactly why it was set up this 
way, maybe to allow different encodings in the future? I just don't know
<etiago> ok yeah, that sort of makes sense
<kmarsden> The next things that flow from the AR to the AS are ACCEC and SECCHK 
to negotiate security. These are chained before receiving the EXCSATRM so these 
are in EBCDIC too.
<kmarsden> SECCHK (or maybe ACCSEC) has an optional RDBNAM parameter which 
Derby client used to send.
<etiago> that's the database name if I recall correctly
<kmarsden> Some time ago I took that out in preparation of DERBY-728. We now do 
not send the RDBNAM until it is required in ACCRDB
<kmarsden> right.
<etiago> oh by removing it from the SECCHK (which is EBCDIC-only) we can 
actually support database names with chinese and japanese characters
<kmarsden> Anyway after the AR flows EXCSAT, ACCSEC and SECCHK then the server 
sends back an EXCSATRM with it's manager level list and then the AR and AS 
settle on the level .
<kmarsden> right.
<etiago> ok what exactly is this level that they settle on?
<kmarsden> For each level, for instance UNICODEMGR, if the AR sent 1208 and the 
AS sent back 0, they would settle on 0 all EBCDIC.
<kmarsden> I suppose it is actually the server that takes what the client has 
provided and returns the actual level that it will be.
<etiago> oh ok
<kmarsden> If the server could handle 1208 but the client said 0, it would be 0.
<kmarsden> At least I think that's how it works. You better check
<etiago> that makes sense but I will check
<kmarsden> Also if the client knows nothing about this new manager level the 
server will dumb things down,
<kmarsden> For example if a 10.3 client tries to connect to a 10.6 server after 
DERBY-728 has been implemented, EBCDIC it will be, UNICODEMGR level 0
<kmarsden> make sense?
<etiago> yep we have to maintain compatibility
<etiago> by using what's common to everyone
<kmarsden> right
<kmarsden> So let's say you have implemented DERBY-728 and are using new client 
and server.   AR sent 1208, AS responded 1208 for UNICODEMGR.   It is with the 
next request ACCRDB that the encoding changes.
<kmarsden> It is with this request that we now finally send the RDBNAM in UTF-8 
so we can send Chinese. Hooray!
<etiago> yay :-)
<kmarsden> But there is gotcha of course.  The DRDA character strings have 
limits and not just character limits but byte limits.
<etiago> meaning that the arguments of ACCRDB, etc have certain byte length 
limitations, right?
<kmarsden> right. It is really very sad because from a Derby perspective, these 
are just arbitrary limits and since the database name is a often a full file 
system path and now characters can take up to four bytes I am thinking folks 
are going to run out.
<etiago> yeah, I suppose these limits are defined somewhere in those DRDA 
documents?
<kmarsden> but I understand extending or eliminating 1) would require a new 
opengroup ACXR which seems to take years and 2) will be a total no go in the C 
world.
<kmarsden> right.  I think RDBNAM is 255 bytes
<etiago> actually, about that, is this ACR actually approved?
<kmarsden> It is not.  It is undergoing the "fast track" approval process at 
opengroup with some other ACR's.  Work on it within IBM completed over a year 
ago, but they were waiting for a bunch of others to be ready to submit
<etiago> gotta love bureaucracy :-)
<kmarsden> But I think it is pretty solid.  The work you do should be for 10.7, 
not 10.6
<kmarsden> so we'll be safe just in case. 
<etiago> alright
<kmarsden> I should mention that this ACR was not originally for the purpose we 
are using it for.  It was originally meant  some sort of performance 
improvement to avoid converting to EBCDIC all the time.
<kmarsden> At least 6 times that I can remember someone suggested moving the 
switch to UTF-8 later after ACCRDB, so I have scouts at opengroup that promise 
they will send up a big red flag if somebody tries i again.
<etiago> hahaha
<kmarsden> anyway moving along.
<etiago> that's good
<kmarsden> So what I have covered is the part of the protocol that you will be 
dealing with mostly. I will move on with some other protocol stuff but mostly 
for general knowledge. Then some other time we can have a talk about the Derby 
code in relation to the protocol
<etiago> ok sounds good
<kmarsden> So when Network server receives the ACCRDB it makes an embedded 
connection to the database which it will use for all the requests on that 
client connection.  The simplest thing that might come through is an EXCSQLIMM 
which is just a simple statement execution with no result set and not a 
prepared statement, maybe just an update or delete.
<etiago> yep I got that in my trace: [derby]        SEND BUFFER: EXCSQLIMM      
        (ASCII)           (EBCDIC)
<kmarsden> Most statements need to be prepared. Even if they are not a JDBC 
prepared statement, if they return a result set, they have to be prepared with 
a PRPSQLSTT
<kmarsden> The PRPSQLSTT command takes a package name, consistency token and 
section number.  The package name and section number are used to identify the 
statement later when it is executed.  
<kmarsden> In DB2 these are actual things and there are packages related to 
holdability and other things I can't remember and the section is the statement 
within that category.
<etiago> hmm ok
<kmarsden> In Derby, we just sort of pretend, because we don't actually have 
packages.  But these are how the statements are identified and Network Server 
keeps a hash table of the prepared statements keyed on package and section 
number for retrieval.
<kmarsden> An interesting way to look at this is to start network server and 
then go into ij and prepare a statement or two  then  run runtimeinfo to see 
the statements, their packages and section numbers.
<etiago> ok
<kmarsden> After PRPSQLSTT is the SQLSTT which is the actual SQL statement 
text. This I believe can already be encoded UTF-8 even without ACR7007, so you 
can have international characters in SQL even now.
<kmarsden> Once the statement is prepared, you execute it with ECSQLSTT.  Of 
course when you do this, you send again the package name, consitency token and 
section number so the server can look it up and match it to the prepared 
statement
<etiago> hmm but is it actually done right now in Derby?
<kmarsden> SQLDTA contains the parameter data and may just have place holders 
for large objects which are sent in an EXTDTA
<etiago> ok
<kmarsden> If the statement is a query, the AR sends an OPNQRY, and gets an 
OPNQRYRM and then iterates with CNTQRY (sort of like next but for many rows) 
and gets QRYDTA and possibly EXTDTA objects back with the data.
<kmarsden> En the end the server sends a ENDQRYRM and a SQLCARD indicating the 
end of data.
<kmarsden> The SQLCARD is an interesting thing. Generally it is used for 
SQLExceptions, but there is a special one for end of data.
<etiago> so a simple SELECT statement actually represents several DRDA 
commands, right?
<kmarsden> right, but not nearly as many as in that book.  I still don't 
understand why it's so big. The ones I mentioned here are pretty much the pages 
you will go to over and over again
<etiago> hehe fair enough :)
<kmarsden> When the client connection is all done, there is no final end 
connection command that it sends, which really irritating. The socket is closed 
on the client and the server has to detect that and clean up. This has lead to 
many leak issues over the years. Maybe someday there will be an actual command 
which will make things easier.
<etiago> maybe another ACR is due
<kmarsden> So I guess that's the end of the protocol story for today, unless 
you have questions.
<kmarsden> I wonder too if dag is here and  if so how many lies he can spot. 
<etiago> haha I think that covers it for now, I will have questions over time 
I'm sure
<kmarsden> Yes, maybe you can write the ACR's for extending the character 
string lengths and a connection termination command.  It will look nice on your 
resume and nice for opengroup to have input from a more diverse group.
<etiago> maybe #:)
<etiago> maybe I still have a too simplistic insight of DRDA on Derby, but I 
think this will all boil down to detect the level of the encoding and 
instantiate the right encoder class based on that
<kmarsden> I am a bit torn sometimes about whether we should just branch out 
since we have our own ciient.
<kmarsden> Yes, that sounds right.
------------------------------------8<----------------------

Regards,
Tiago Espinha












      

Reply via email to