Thank you so much, Tiago. On Apr 8, 2010, at 3:49 PM, Tiago Espinha <[email protected]> wrote:
Dear all, Earlier today Kathey gave me a crash course on DRDA for my GSoC project. Jayaram asked Kathey for a transcript of our conversation and Kathey suggested that I'd send it to the list, so that other contributors could spot eventual mistakes. If you find any, please feel free to chip in with your knowledge. Here goes: ---------------------8<---------------------- <kmarsden> DRDA - Distributed Relational Database Architecture. <kmarsden> Basically it is a protocol that shuttles database requests from a client over the network to a server. The call the client an Application Requester and the server an Application Server. <kmarsden> The Application Server term predates what we think of as an Application Server and has nothing to do with it. <etiago> ok <kmarsden> So as we discussed, the derby client JDBC Driver (our Application Requester) converts JDBC calls into DRDA, sends the DRDA accross the network to Network Server (our Application Server) which converts them back into JDBC which it sends to the embedded driver. <kmarsden> So the whole thing is a JDBC to DRDA to JDBC converter. This way we meet the requirement of having multiple jvms on multiple machines accessing a single embedded database, because everything gets routed through the network server process. <kmarsden> Make sense? <etiago> it does <kmarsden> DRDA is mostly associated historically with DB2 but there are actually a lot of licensed DRDA vendors, including Microsoft <etiago> so let me try to establish a comparison here <etiago> DRDA is sort of a platform-agnostic way of transferring database requests over the network, is this right? <etiago> something like XML <kmarsden> yes. That's right. It can go over TCP/IP or something called SNA, but yes it is platform-agnostic. As you may guess I think it started on the mainframe with EBCDIC encoding. <kmarsden> Having a standard protocol has allowed even for a single client to be shared amongst several databases. <kmarsden> For example the IBM DB2 Universal JDBC Driver (JCC) used to be used with Derby and is still used with Informix as well as of course many flavors of DB2. <kmarsden> DRDA iis also used with the DB2 C client. Not just JDBC <etiago> I see <kmarsden> In practice it is not as portable as you would like. It is generally a lot of work to get one product's DRDA client working with a different server. <etiago> ok, is DRDA just plaintext? I was looking at the trace and I saw that there's an hexadecimal, ASCII and EBCDIC representations of the data <kmarsden> The encoding for data can be described in the protocol flow. Up until now the DDM commands and parameters (which we will discuss in a bit) were all in EBCDIC. <etiago> ok <kmarsden> UNICODEMGR allows them (except for the earliest) to be in UTF-8 <kmarsden> So there are three volumes to the manuals. Volume 1 DRDA describes the protocol flow. <kmarsden> It shows the commands and what order they flow in from AR to AS and back. <kmarsden> But to read it you need to look at Volume 2, which describes the commands and objects that are being flowed in detail <kmarsden> The DDM manual shows a 2 byte "Codepoint" for each one of these commands or objects and that is how they are identified. <etiago> ah, kind of counterintuitive then #:) <kmarsden> You will see Codepoint.java class in both client and server that lists the codepoints in Derby <etiago> I did look at that and all it reminded me of was pointers <kmarsden> Vol 3 describes the format for the data types. <etiago> so DRDA also has data types of its own? <kmarsden> I can't say I have ever actually used Vol 3 or understood it. That work was done pretty early before even my time. I spend my time mostly in Vol 2 and Vol 1 <etiago> oh alright <kmarsden> Yes, it does and I think at two levels. For example DRDA itself in Volume 1 will talk about Varchar and Char types but Vol 3 just talks about strings in general. But we'll move on because this is not where you will be working. <etiago> ok * kasun ([email protected]) entrou em #derby * kasun agora chama-se Guest50421 <kmarsden> Ok. All this protocol flow and DDM objects are wrapped in something called a Data Stream Structure (DSS) which is just really a packet wrapper and has no real content except descxribing the length and chaniing one DSS to another. You will see these in the traces starting with a D0 and identifes the packet as a request a reply or an object. <kmarsden> I only mention that really because you will see DSS referred to all over the place. I don't think you will have to change that code either. <etiago> yes, I've seen DSS stuff around <kmarsden> Now lets look at the actual protocol. When a connection is made by the AR the first thing that is sent is an EXCSAT <kmarsden> Exchange Server Attributes Request <etiago> yes, I see that on the trace <kmarsden> This even with UNICODEMGR will be sent EBCDIC and is really important because it is where the protocol levels and now encoding are negotiated. <etiago> as I understand, we keep this as EBCDIC to allow clients that don't support UTF-8 to ask for normal EBCDIC rather than UTF-8 <kmarsden> Right. If you look at the DDM manual which is alphabetical at EXCSAT, you will find out about it., <kmarsden> <etiago> ok <kmarsden> Generally with EXCSAT you send an external name identifying the client, a version and a manager level list. <kmarsden> It is the manager level list or MGRLVLLS where the protocol negotiation and now encoding takes place. <kmarsden> oh no, I switched computers and don't have my spec to look at so I wlll go from memory. <kmarsden> Most of the Manager levels like SQLAM for instance are just a number 1- whatever to show the protocol level, but UNICODEMGR the one being introduced for ACR7007 is a bit different. <kmarsden> For that one the client will send 0 if it wants to continue with EBCDIC or 1208 if it wants UTF-8. I am not sure exactly why it was set up this way, maybe to allow different encodings in the future? I just don't know <etiago> ok yeah, that sort of makes sense <kmarsden> The next things that flow from the AR to the AS are ACCEC and SECCHK to negotiate security. These are chained before receiving the EXCSATRM so these are in EBCDIC too. <kmarsden> SECCHK (or maybe ACCSEC) has an optional RDBNAM parameter which Derby client used to send. <etiago> that's the database name if I recall correctly <kmarsden> Some time ago I took that out in preparation of DERBY-728. We now do not send the RDBNAM until it is required in ACCRDB <kmarsden> right. <etiago> oh by removing it from the SECCHK (which is EBCDIC-only) we can actually support database names with chinese and japanese characters <kmarsden> Anyway after the AR flows EXCSAT, ACCSEC and SECCHK then the server sends back an EXCSATRM with it's manager level list and then the AR and AS settle on the level . <kmarsden> right. <etiago> ok what exactly is this level that they settle on? <kmarsden> For each level, for instance UNICODEMGR, if the AR sent 1208 and the AS sent back 0, they would settle on 0 all EBCDIC. <kmarsden> I suppose it is actually the server that takes what the client has provided and returns the actual level that it will be. <etiago> oh ok <kmarsden> If the server could handle 1208 but the client said 0, it would be 0. <kmarsden> At least I think that's how it works. You better check <etiago> that makes sense but I will check <kmarsden> Also if the client knows nothing about this new manager level the server will dumb things down, <kmarsden> For example if a 10.3 client tries to connect to a 10.6 server after DERBY-728 has been implemented, EBCDIC it will be, UNICODEMGR level 0 <kmarsden> make sense? <etiago> yep we have to maintain compatibility <etiago> by using what's common to everyone <kmarsden> right <kmarsden> So let's say you have implemented DERBY-728 and are using new client and server. AR sent 1208, AS responded 1208 for UNICODEMGR. It is with the next request ACCRDB that the encoding changes. <kmarsden> It is with this request that we now finally send the RDBNAM in UTF-8 so we can send Chinese. Hooray! <etiago> yay :-) <kmarsden> But there is gotcha of course. The DRDA character strings have limits and not just character limits but byte limits. <etiago> meaning that the arguments of ACCRDB, etc have certain byte length limitations, right? <kmarsden> right. It is really very sad because from a Derby perspective, these are just arbitrary limits and since the database name is a often a full file system path and now characters can take up to four bytes I am thinking folks are going to run out. <etiago> yeah, I suppose these limits are defined somewhere in those DRDA documents? <kmarsden> but I understand extending or eliminating 1) would require a new opengroup ACXR which seems to take years and 2) will be a total no go in the C world. <kmarsden> right. I think RDBNAM is 255 bytes <etiago> actually, about that, is this ACR actually approved? <kmarsden> It is not. It is undergoing the "fast track" approval process at opengroup with some other ACR's. Work on it within IBM completed over a year ago, but they were waiting for a bunch of others to be ready to submit <etiago> gotta love bureaucracy :-) <kmarsden> But I think it is pretty solid. The work you do should be for 10.7, not 10.6 <kmarsden> so we'll be safe just in case. <etiago> alright <kmarsden> I should mention that this ACR was not originally for the purpose we are using it for. It was originally meant some sort of performance improvement to avoid converting to EBCDIC all the time. <kmarsden> At least 6 times that I can remember someone suggested moving the switch to UTF-8 later after ACCRDB, so I have scouts at opengroup that promise they will send up a big red flag if somebody tries i again. <etiago> hahaha <kmarsden> anyway moving along. <etiago> that's good <kmarsden> So what I have covered is the part of the protocol that you will be dealing with mostly. I will move on with some other protocol stuff but mostly for general knowledge. Then some other time we can have a talk about the Derby code in relation to the protocol <etiago> ok sounds good <kmarsden> So when Network server receives the ACCRDB it makes an embedded connection to the database which it will use for all the requests on that client connection. The simplest thing that might come through is an EXCSQLIMM which is just a simple statement execution with no result set and not a prepared statement, maybe just an update or delete. <etiago> yep I got that in my trace: [derby] SEND BUFFER: EXCSQLIMM (ASCII) (EBCDIC) <kmarsden> Most statements need to be prepared. Even if they are not a JDBC prepared statement, if they return a result set, they have to be prepared with a PRPSQLSTT <kmarsden> The PRPSQLSTT command takes a package name, consistency token and section number. The package name and section number are used to identify the statement later when it is executed. <kmarsden> In DB2 these are actual things and there are packages related to holdability and other things I can't remember and the section is the statement within that category. <etiago> hmm ok <kmarsden> In Derby, we just sort of pretend, because we don't actually have packages. But these are how the statements are identified and Network Server keeps a hash table of the prepared statements keyed on package and section number for retrieval. <kmarsden> An interesting way to look at this is to start network server and then go into ij and prepare a statement or two then run runtimeinfo to see the statements, their packages and section numbers. <etiago> ok <kmarsden> After PRPSQLSTT is the SQLSTT which is the actual SQL statement text. This I believe can already be encoded UTF-8 even without ACR7007, so you can have international characters in SQL even now. <kmarsden> Once the statement is prepared, you execute it with ECSQLSTT. Of course when you do this, you send again the package name, consitency token and section number so the server can look it up and match it to the prepared statement <etiago> hmm but is it actually done right now in Derby? <kmarsden> SQLDTA contains the parameter data and may just have place holders for large objects which are sent in an EXTDTA <etiago> ok <kmarsden> If the statement is a query, the AR sends an OPNQRY, and gets an OPNQRYRM and then iterates with CNTQRY (sort of like next but for many rows) and gets QRYDTA and possibly EXTDTA objects back with the data. <kmarsden> En the end the server sends a ENDQRYRM and a SQLCARD indicating the end of data. <kmarsden> The SQLCARD is an interesting thing. Generally it is used for SQLExceptions, but there is a special one for end of data. <etiago> so a simple SELECT statement actually represents several DRDA commands, right? <kmarsden> right, but not nearly as many as in that book. I still don't understand why it's so big. The ones I mentioned here are pretty much the pages you will go to over and over again <etiago> hehe fair enough :) <kmarsden> When the client connection is all done, there is no final end connection command that it sends, which really irritating. The socket is closed on the client and the server has to detect that and clean up. This has lead to many leak issues over the years. Maybe someday there will be an actual command which will make things easier. <etiago> maybe another ACR is due <kmarsden> So I guess that's the end of the protocol story for today, unless you have questions. <kmarsden> I wonder too if dag is here and if so how many lies he can spot. <etiago> haha I think that covers it for now, I will have questions over time I'm sure <kmarsden> Yes, maybe you can write the ACR's for extending the character string lengths and a connection termination command. It will look nice on your resume and nice for opengroup to have input from a more diverse group. <etiago> maybe #:) <etiago> maybe I still have a too simplistic insight of DRDA on Derby, but I think this will all boil down to detect the level of the encoding and instantiate the right encoder class based on that <kmarsden> I am a bit torn sometimes about whether we should just branch out since we have our own ciient. <kmarsden> Yes, that sounds right. ------------------------------------8<---------------------- Regards, Tiago Espinha
