Added: incubator/directory/asn1/branches/rewrite/ber/xdocs/eureka.xml URL: http://svn.apache.org/viewcvs/incubator/directory/asn1/branches/rewrite/ber/xdocs/eureka.xml?view=auto&rev=154725 ============================================================================== --- incubator/directory/asn1/branches/rewrite/ber/xdocs/eureka.xml (added) +++ incubator/directory/asn1/branches/rewrite/ber/xdocs/eureka.xml Mon Feb 21 13:44:46 2005 @@ -0,0 +1,929 @@ +<?xml version="1.0" encoding="UTF-8"?> +<document> + <properties> + <author email="[EMAIL PROTECTED]">Alex Karasulu</author> + <title>Eureka!</title> + </properties> + + <body> + + <section name="And we saw the light ... "> + <p> + One day Wes and Alex started talking about going to town on a new + ASN.1 BER Library and here's what happened ... + </p> + <subsection name="The Conversation"> + <source> +[SNIP] + +Wes says: +I've been thinking about the decoding process a bit over the weekend. + +Alex Karasulu says: +k I'm listening + +Wes says: +and encoding. + +Wes says: +I'm not sure at the initial stage there will be *one* decoder. + +Wes says: +We will need some place to hold our TLV tree. + +Wes says: +and also, I was thinking about really long messages. + +Alex Karasulu says: +you need multiple codecs (coder decoders) + +Alex Karasulu says: +right + +[SNIP] + +Wes says: +We got one part that builds the tree + +Wes says: +part two should be the translation. + +[SNIP] + +Wes says: +I think the only issue we have is how to handle chunking, and blocking versus +non-blocking code. + +Wes says: +And also, dealing with really huge messages. + +Wes says: +It obviously won't make sense to build a TLV tree in its entirety for a huge +search result. + +Alex Karasulu says: +right I agree + +Alex Karasulu says: +for encoding there is a mechanism for breaking down large TLVs of simple types +down + +Wes says: +encoding is a non issue as far as chunking goes. + +Alex Karasulu says: +basically in the book they talk about 3 ways of specifying length + +Alex Karasulu says: +the L part + +Alex Karasulu says: +right but it effects decoding + +Alex Karasulu says: +but if another provider is doing encoding I see what you mean + +Alex Karasulu says: +basically we can break stuff down by injecting the 3rd indeterminate length form + +Alex Karasulu says: +follow me out + +Wes says: +You give the encoder an output interface, and every time it fills up the byte +bufffer, it spits it out. + +Alex Karasulu says: +Strictly talking about decoding and chunk sizes for now. + +Wes says: +K. decoding then. + +Alex Karasulu says: +just for background - you read the section on the 3 different modes for +specifying length right: short, long and indeterminant? + +[SNIP] + +Alex Karasulu says: +Your reading and encounter a really big simple type using the long encoding for +L. So you know what you have to read is a hugh blob of data in one big hunk. +Basically there is some threashold u use to judge whether or not the blob is too +big and needs to be chopped up. + +Wes says: +I did read the section the length. + +Alex Karasulu says: +cool + +Wes says: +I actually printed the whole appendix out and read it. + +Wes says: +on BER. + +Alex Karasulu says: +cool that's what I was reffering to + +Wes says: +An encoder can choose any one he wants. + +Alex Karasulu says: +Now your decoder can break down the long format into the indeterminate format +nesting smaller TLVs inside the TLV. Hence converting the simple TLV into a +constructed one. + +Alex Karasulu says: +The key here is not to keep all the tlvs in memory or the entire encoded buffer +in memory + +Wes says: +For decoding, there are messages where keeping the intermediate form in memory +is not an issue, and with others, there are. + +Wes says: +issues. + +Alex Karasulu says: +Right depends on the message size + +Wes says: +The client will want to process most of the messages as a complete object. + +Wes says: +By definition, it will be in memory. + +Alex Karasulu says: +Yeah I know what you are saying. We need to make the library not do this +though. Then there would be more than one copy in memory. Leave it upto the +user to determine how the data is dealt with. Eventually we can take messures +to stream data if we want instead of having it all in memory. + +Wes says: +Back up just a second. + +Alex Karasulu says: +There are funky tactics we can employ way down the road - but for the time +being lets make it so our codecs dont need massive footprints + +Alex Karasulu says: +sure talk to me + +Wes says: +I used this technique in a Btrieve interface I wrote for U. S. South... + +Wes says: +which I stole from OpenTDS. + +Alex Karasulu says: +Btrieve? + +Wes says: +Yea, an ISAM database. + +Alex Karasulu says: +oh ok + +Wes says: +It used byte buffers to send and retrieve records. + +Wes says: +I wrote a java class that basically treated the byte array as primitives. + +Alex Karasulu says: +cool so you're already of the mindset to keeping the decoding and encoding +memory footprints small + +Wes says: +That might not work with us though. + +Wes says: +It might. + +Wes says: +All we need to know + +Wes says: +is that this field goes with this TLV. + +Wes says: +and convert it on the fly. + +Wes says: +Also, we an simply dump the TLVs when we are done. + +Alex Karasulu says: +yeah that's part of some tables we may need to maintain with a mappiung + +Alex Karasulu says: +right I think we're on the same page + +Alex Karasulu says: +I have a small idea though + +Alex Karasulu says: +Basically wrt the codec's interfaces + +Alex Karasulu says: +To me you give an array of bytes in a byte[] or a ByteBuffer (this is the +delivered partial chunk) and you get back a set of TLVs for that chunk. + +Alex Karasulu says: +or take it in the opposite direction for a encoder + +Alex Karasulu says: +this is your stage 1 (BER bytes ->TLVs) + +Alex Karasulu says: +now we need to find a way to represent TLVs in a linear fashion and still +maintain the tree structure. However we don't want direct back references +to where the list of TLVs plug into the entire tree because this would mean +we have to have the whole tree in memory. + +Alex Karasulu says: +does that make sense I know its a lil nebulous + +Wes says: +Keep it simple + +Alex Karasulu says: +ok in decoding bytes go in and TLVs come out + +Wes says: +Right. + +Alex Karasulu says: +state is maintained between times u pump in bytes + +Alex Karasulu says: +wit me? + +Wes says: +Yup. + +Alex Karasulu says: +now the TLVs comming out are a peice of the TLV tree + +Wes says: +You got to be able to handle partial Ts, Ls, and Vs. + +Alex Karasulu says: +right that's part of the state stuff + +Alex Karasulu says: +if you're stuck in the middle of a simple tlv then you don't pump it out until +the chunks to complete it have arrived + +Alex Karasulu says: +wit me? + +Wes says: +right. + +Alex Karasulu says: +So the key here is to have the right TLV represntation or data structure. We +have some requirements on this. + +Alex Karasulu says: +the TLVs that come out of the decoder cannot directly, with java references, +refer to other TLVs that came out before. Because these references would +require the entire TLV tree in memory. + +Alex Karasulu says: +This is one of those requirements you agree? + +Wes says: +I don't see that being an issue. + +Wes says: +The parent needs to know about the children, but not vis a versa. + +Alex Karasulu says: +right + +Wes says: +and I don't see how you are going to be able to assemble an ASN.1 message in a +state driven fashion without making it very complicated. + +Alex Karasulu says: +that's our primary issue here + +Wes says: +and have two decoders hooked together as well. + +Alex Karasulu says: +its a big problem to overcome + +Alex Karasulu says: +and do it elegantly + +Alex Karasulu says: +If we do this then our BER ASN.1 codec will be hot working in a non-blocking +fashion and being very efficient. It's like the way SAX is used for reading +XML for our ASN.1 messages instead of using DOM. + +Alex Karasulu says: +the ideas are similar + +Alex Karasulu says: +you didn't think this was gonna be a cake walk did ya + +Wes says: +Hmmmm. + +Alex Karasulu says: +you do understand where I was coming from wit the sax and dom stuff right? + +Wes says: +yea. + +Wes says: +That I understand. + +Alex Karasulu says: +do you think its possible? + +Wes says: +So you have an event driven ASN.1 parser. + +Wes says: +I think that's still easy. + +Wes says: +However, assembling them into the messages is still complicated. + +Wes says: +every ASN.1 message type would have to be derived from our parser. + +Wes says: +Then a factory could create the message type based on the application type. + +Alex Karasulu says: +hmmm + +Alex Karasulu says: +what do you mean by: "every ASN.1 message type would have to be derived from +our parser. + +Wes says: +You want the ASN.1 messages to be able to assemble themselves? or no. + +Alex Karasulu says: +Now you're talking about using the ASN.1 specification like a DTD to drive the +decoding + +Alex Karasulu says: +? + +Alex Karasulu says: +Yep I see yes + +Alex Karasulu says: +u use the ASN.1 spec or a set of classes generated by an ASN.1 spec compiler + +Alex Karasulu says: +question is do we need a compiler now? + +Wes says: +Right. + +Wes says: +Factory returns the ASN.1 message on the application tag. + +Alex Karasulu says: +right I see where your going with the design + +Wes says: +the parser then passes everything to the ASN.reader interface, + +Wes says: +SAX like. + +Alex Karasulu says: +Hmm sounds like it should be very possible + +Wes says: +of the application object. + +Wes says: +who knows how to assemble himself. + +Alex Karasulu says: +right + +Alex Karasulu says: +This is huge + +Alex Karasulu says: +I wonder if other ASN.1 tools have this sax like mechanism already in place. + +Wes says: +But how do we handle ASN.1 messages which need to be streamed. + +Wes says: +like a huge search result. + +Alex Karasulu says: +that's not so much the issue + +Alex Karasulu says: +a large result set takes n+2 messages + +Alex Karasulu says: +sorry n+1 + +Wes says: +You have a search result tight. + +Wes says: +Tag = Applicationz +Length = 00 +Value = Search Results + +Wes says: +Now V is made up of thousands of result messages. + +Alex Karasulu says: +In the LDAP protocol a search result is returned as n+1 messages. + +Alex Karasulu says: +each result is an SearchEntryResponse for the 'n' and one SearchDoneResponse +PDU to end the resultset + +Alex Karasulu says: +n+1 messages + +Wes says: +Ah. + +Wes says: +But are they wrapped in an application TLV? + +Alex Karasulu says: +but think of a large blob of data + +Wes says: +or is it just one stream of TLVs. + +Alex Karasulu says: +like say some binary chunk + +Alex Karasulu says: +the application TLV for each response type is in the LDAP message envelope. +There is a top level LDAP message type which is a TLV then the different +response types have you know some enumeration values to determine which +response type the top level envelope or application TLV represents + +Wes says: +Right. + +Alex Karasulu says: +but your question is valid for say a single SearchEntryResponse where one of +the attributes is a huge binary chunk + +Wes says: +So the event firing for the top level envelope will be different than the TLVs +which are part of the envelope. + +Alex Karasulu says: +the top level LDAPMessage envelope defined for the LDAP asn.1 will be a +constructred TLV + +Alex Karasulu says: +event might fire for it + +Alex Karasulu says: +same one every time + +Wes says: +Right, but not after the entire TLV is read into memry. + +Wes says: +that would defeat our SAX based parser. + +Alex Karasulu says: +but its constitution will change depending on the type of message it is + +Alex Karasulu says: +right + +Alex Karasulu says: +exactly + +Wes says: +I'm with you. + +Alex Karasulu says: +you would get a start_ldap_message event + +Wes says: +Actually, + +Wes says: +for the envelope, you would need to hit the factory. + +Wes says: +to get the appropriate LDAP message. + +Alex Karasulu says: +then perhaps the message_type_event will fire to note the contained TLV that +specifies the LDAP application's message type. + +Alex Karasulu says: +et. cetera. see where i'm going with it - you don't need the entire message to +fire its arrival. Like sax where you say start tag for this element then the +contained elemenets then close tags etc. + +Wes says: +Got ya. + +Wes says: +I think that's pretty cool. + +Alex Karasulu says: +I think we're getting somewhere cool here I'm very excited. I need to take +another look at a sax implementation again out there. It will give me some +insight into some possible general architecture for us. + +Alex Karasulu says: +Now going back to the massive chunk of binary. So we have a +SearchEntryResponse with an entry of the result set containing an attribute +that is a huge binary chunk. How do we stream it out right? Then we can +talk about how we stream it in. + +Alex Karasulu says: +Streaming it out is easy. Let's for a moment presume that we can actually +stream out of the jdbm stuff. You basically convert the long known length BER +encoding to the indeterminant encoding. Then send out individual chunks of +this binary attribute in separate TLVs. So you're turning big assed primitive +TLVs into constructed TLVs chunking out the content hence not needing the +entire V in memor + +Alex Karasulu says: +y. + +Wes says: +That's fine for us. We have control over the encoding. + +Wes says: +We won't be so lucky on the inbound side. + +Alex Karasulu says: +Right + +Alex Karasulu says: +Now let's think about that beast. + +Alex Karasulu says: +We have a binary -> tlv encoder spitting out tlvs with each bit of input + +Alex Karasulu says: +meant decoder above sorry + +Alex Karasulu says: +now if the indeterminate length is used by the client when encoding and +sending to the server the server is ok the data is already chopped up and +its all good. If not and the long length encoding is used then the data +comes into the server's decoder in chunks but the decoder sees a hugh +long length. + +Alex Karasulu says: +Based on some threshold the decoder translates the incoming long length and +values for the simple type (primitive TLV) into a constructed TLV breaking +up the large know length TLV into the indeterminant form which can be spit +out with a few nested TLVs at a time (with each input chunk going into the +decoder). + +Alex Karasulu says: +You follow? Decoder automatically breaks up large primitive long length encoded +TLVs into the indeterminate form and spits those out in peices rather than the +one large primitive TLV. + +Wes says: +What does that buy us? + +Alex Karasulu says: +streaming + +Wes says: +Is not the ASN message gonna re-assemble it anyways. + +Wes says: +Do you still end up with 200K picture in the ASN.1 message. + +Alex Karasulu says: +yeah that's application specific - remember we're talking just the BER->TLV +codec + +Alex Karasulu says: +the other codec is Type to TLV + +Wes says: +If we are using a SAX based parser, then the Type will be assembling itself as +the TLVs are decoded and fired. + +Alex Karasulu says: +keeping it streaming means you don't have 2X the data or 400K in use just to +get the 200K picture + +Alex Karasulu says: +right + +Wes says: +At some point, you are going to have to put your faith in the garbage collector. + +Alex Karasulu says: +right but that's not in the codec BER to TLV code + +Alex Karasulu says: +keep that lean and mean - why you ask + +Wes says: +Also, if you want a truly small memory footprint, then you could put stuff like +that in a small embedded database. + +Alex Karasulu says: +well the TLV to Type code can be made lean and mean too + +Wes says: +I just don't think at this stage that we need to be all that worried about huge +blocks of binary data. + +Alex Karasulu says: +right we use referrals to data on disk to manage large peices of data that +needsto be streamed but this we can do later. + +Wes says: +Exactly. + +Alex Karasulu says: +yes but we want the options to be open - right now we can just design the +interfaces so all this can be added later. + +Alex Karasulu says: +Interfaces and contracts should be designed to allow these very low memory +footprints. Thinking through the process and what it takes to get there +makes us understand better what the design and interfaces should look like. + +Alex Karasulu says: +I don't care if the first implementation is a hog + +Wes says: +The BER stuff today doesn't deal with this. + +Wes says: +It doesn't care. + +Alex Karasulu says: +for large peices of data + +Wes says: +It's an application issue. + +Alex Karasulu says: +right + +Alex Karasulu says: +what the app does with it is upto the app but lets keep the ber codecs low in +memory image regardless of the fact that some app will be a pig and stream the +data into memory anyway. This is all that I'm trying to say. + +Alex Karasulu says: +wit me? + +Wes says: +K. + +Alex Karasulu says: +cool we're tight on this but I think it will take more research on both our +parts - anyway apache is back up again after a power failure. Here's the new +stuff I created for ya: +http://cvs.apache.org/viewcvs.cgi/incubator/directory/snickers/?root=Apache-SVN + +Alex Karasulu says: +that's the top level of the snickers (snacc replacement) subproject + +Alex Karasulu says: +that's all you and Jeff with the C based version of this thang + +Wes says: +Right. + +Wes says: +You won't find much other ASN.1 stuff out there. + +Wes says: +I'm comfortable that no one is doing it this way, either. + +Wes says: +It will make it unqiuely, Apache. + +Alex Karasulu says: +Ok. Let's touch base in a day or two to regroup + +Wes says: +Do you think ASN.1 is going to die? + +Alex Karasulu says: +this is all good stuff and I'll try to get it out there. + +Alex Karasulu says: +no way + +Alex Karasulu says: +ASN.1 is awesome stuff + +Wes says: +We'll see. + +Alex Karasulu says: +SNMP is based on it and so is Kerberose + +Alex Karasulu says: +what's the alternative? + +Wes says: +XML is what everyone is using now. + +Alex Karasulu says: +well there is XER for ASN.1 + +Alex Karasulu says: +XML Encoding Rules + +Alex Karasulu says: +ASN.1 can go to BER, PER, XER, and DER + +Wes says: +Yes. + +Alex Karasulu says: +the encoding does not effect the ASN.1 specification and that is what makes +ASN.1 a winner always. + +Wes says: +Slapping XML on ASN.1 ain't the same. + +Alex Karasulu says: +the XML format is just for the encoding of the data types + +Wes says: +I agree that ASN.1 is a good protocol. + +Alex Karasulu says: +protocol specification syntax + +Alex Karasulu says: +it kicks ass I think and is here to stay. + +Wes says: +If we do this, we are going to go backwards right? + +Wes says: +Do the compiler last. + +Alex Karasulu says: +go backwards? + +Alex Karasulu says: +yeah that might be the case or we can work it together. + +Wes says: +You need to let me work this. + +Alex Karasulu says: +I can do the compiler with you and you can handle the runtime + +Wes says: +You got other things to do. + +Alex Karasulu says: +ok its all you then + +Alex Karasulu says: +I'm just a follower + +Wes says: +I won't mind help with the compiler. + +Wes says: +Just don't get going on it any time soon + +Alex Karasulu says: +sure I have extensive javacc and antlr experience + +Wes says: +Deal. + +Alex Karasulu says: +hehe no worries with that my plate as you know is overflowing. + +Alex Karasulu says: +my bladder too + +Alex Karasulu says: +I'll catch ya later I need to hit the head + +Wes says: +Talk about the decoder's stream. + +Wes says: +K + +Alex Karasulu says: +ttyl + +Wes says: +Talk later then. + +Alex Karasulu says: +ok gimme 45 seconds + +Alex Karasulu says: +I'm back + +Alex Karasulu says: +what about the decoder's stream. + +Wes says: +So, how do we feed the decoder then. + +Alex Karasulu says: +Its all about how we design our interfaces. You know I've been looking at +commons-codec and see some potential but changes will be needed. + +Alex Karasulu says: +Follow me for a sec. + +Alex Karasulu says: +Now the codec interfaces are designed to convert stuff in one shot. +bytes in bytes out sort of thang. Very blocking dependent stuff and not very +cool for us with a SEDA and NIO based server. + +Alex Karasulu says: +wit me? + +Wes says: +right. + +Alex Karasulu says: +As you might have guessed this is not good for servers that need to keep +memory footprints low while servicing possible serveral hundred requests +per second. + +Alex Karasulu says: +So what do we do? We design new non-blocking and NIO based interfaces for the +codec API and submit them. + +Alex Karasulu says: +its down again damn + +Wes says: +I got my update + +Alex Karasulu says: +cool + +Wes says: +Must of brought it down. + +Alex Karasulu says: +yeah maybe it will be up soon + +Alex Karasulu says: +anyway + +Alex Karasulu says: +We redesign these codec interfaces to manage an encoding session and a decoding +session so chunks can be process in a stateful manner to be conducive to +non-blocking use. + +Alex Karasulu says: +Or we use events like you said + +Alex Karasulu says: +Basically we contribute this to the commons stuff and make sure the community +understands why and what we're doing. That way they can double check us. + +Alex Karasulu says: +Then we use those interfaces to implement the ASN.1 stuff. + +Wes says: +Right. + +Alex Karasulu says: +We do this in the snickers area but put back as much into the commons codec as +we can. You game with this strategy? + +Wes says: +Yea, that's fine. + +Wes says: +I'll check out commons code as soon as it comes up. + +</source> + </subsection> + </section> + </body> +</document>
Added: incubator/directory/asn1/branches/rewrite/ber/xdocs/index.xml URL: http://svn.apache.org/viewcvs/incubator/directory/asn1/branches/rewrite/ber/xdocs/index.xml?view=auto&rev=154725 ============================================================================== --- incubator/directory/asn1/branches/rewrite/ber/xdocs/index.xml (added) +++ incubator/directory/asn1/branches/rewrite/ber/xdocs/index.xml Mon Feb 21 13:44:46 2005 @@ -0,0 +1,67 @@ +<?xml version="1.0" encoding="UTF-8"?> +<document> + <properties> + <author email="[EMAIL PROTECTED]">Alex Karasulu</author> + <title>BER Runtime</title> + </properties> + <body> + <section name="Introduction"> + <subsection name="What is it?"> + <p> + The BER Runtime is an API for encoding and decoding ASN.1 + data structures using Basic Encoding Rules (BER). It implements + extentions to the <a href="http://jakarta.apache.org/commons/codec"> + commons-codec</a> API, for building stateful chunking encoder decoder + pairs that maintain state between processing calls. + </p> + </subsection> + + <subsection name="Stateful Codecs"> + <p> + More information on these new codec interfaces are availabled on the + <a href="../asn1-codec/index.html">stateful codec</a> home page. + You might want to read this before you continue since these extentions + are the basis to all ASN.1 encoders and decoders. + </p> + </subsection> + + <subsection name="What is encoded/decoded?"> + <p> + The BER runtime is protocol or ASN.1 module independent. The unit of + substrate is a BER TLV (Tag, Length, Value) so any BER based protocol + can be decoded and encoded by the BER codec to and from TLV tuples. + </p> + </subsection> + </section> + + <section name="BER Codec User Guides and Design Documents"> + <table> + <tr> + <th>Subject</th> + <th>Description</th> + </tr> + + <tr> + <td><a href="./asn1berinfo.html">ASN.1 and BER Information</a></td> + <td>Links to various books and specification on ASN.1 and BER</td> + </tr> + + <tr> + <td><a href="./BERDecoderDesign.html">BER Decoder Design</a></td> + <td>Explains how and why the BERDecoder was designed</td> + </tr> + + <tr> + <td><a href="./BERDigesterDesign.html">BER Digester Design</a></td> + <td>Explains how and why the BERDigester was designed</td> + </tr> + + <tr> + <td><a href="./BEREncoderDesign.html">BER Encoder Design</a></td> + <td>Explains how and why the BEREncoder was designed</td> + </tr> + + </table> + </section> + </body> +</document> Added: incubator/directory/asn1/branches/rewrite/codec/xdocs/index.xml URL: http://svn.apache.org/viewcvs/incubator/directory/asn1/branches/rewrite/codec/xdocs/index.xml?view=auto&rev=154725 ============================================================================== --- incubator/directory/asn1/branches/rewrite/codec/xdocs/index.xml (added) +++ incubator/directory/asn1/branches/rewrite/codec/xdocs/index.xml Mon Feb 21 13:44:46 2005 @@ -0,0 +1,326 @@ +<?xml version="1.0" encoding="UTF-8"?> +<document> + <properties> + <author email="[EMAIL PROTECTED]">Alex Karasulu</author> + <title>Stateful Codecs</title> + </properties> + <body> + <section> + <subsection name="Introduction"> + <p> + Codecs are bidirectional data transformations. The data transformed, + often referred to as the substrate, may be [en]coded or decoded hence + the word codec. The word codec also refers to the actual software + used to encode and decode data. We use the term stateful codec for + lack of a better description for encoder/decoder pairs possessing + certain abilities and exhibiting the following behavoirs: + </p> + + <ul> + <li>the ability to interrupt and resume operation without loosing + state</li> + <li>the ability to process a substrate in one or more steps operating + on small chunks rather than all of it in one large operation</li> + <li>free up resources while not actively processing perhaps until more + of the substrate is available, or just to multiplex limited + resources</li> + <li>use a small fixed size chunk buffer rather than a variable sized + buffer equal to the entire size of the substrate what ever that + may be</li> + </ul> + </subsection> + + <subsection name="Advantages"> + <p> + The abilities or behavoirs listed above make stateful codecs ideal for + use in resource critical situations. Servers for example based on + codecs may have to perform several thousand concurrent encode/decode + operations. The resources required for such operations, namely threads + and memory buffers will be limited. Most of the time these operations + will be waiting for IO to complete so they can free up resources to + allow other operations to proceed. Stateful codecs make this possible + and complement servers designed using non-blocking IO constructs. + </p> + + <p> + Servers cannot afford to allocate variable sized buffers for arriving + data. Allowing variable sized buffers based on incoming data + sizes opens the door for DoS attacks where malicious clients can + cripple or crash servers, by pumping in massive or never ending + data streams. Stateful codecs enable fixed size processing overheads + regardless of the size of the data unit transmitted to the server. + Smaller codec footprints lead to smaller server process memory + footprints. + </p> + + <p> + These advantages also make stateful codecs ideal for use in resource + limited environments like embedded systems, PDAs or cellular phones + which use ASN.1 and one of its encoding schemes to control data + transmission. These systems all run on limited resources where the + codec's operational footprint will have dramatic effects on the + performance of the device. + </p> + </subsection> + + <subsection name="How is a stateful codec defined?"> + <p> + There are several ways to skin this cat. To this day discussions are + underway at the ASF to determine the best approach. Until a consensus + is reached we have decided to use an event driven approach where the + events are modelled as callbacks. To better explain the approach we + need to discuss it within the context of encoding/decoding. + </p> + + <p> + Depending on the operation being performed, available chunks of the + substrate are are processed using either the <code>encode()</code> or + the <code>decode()</code> method. These methods hence are presumed + to process small chunks of the substrate. The specific codec + implementation should know how to maintain state based on the encoding + between these calls to process a unit of substrate which likewise is + determined by the encoding. So the encoding (a.k.a. codec) defines + what a unit of substrate is as well as any state information required + while peice-meal processing the substrate. Several calls to these two + methods may be required to process a unit of the substrate. When the + entire unit has been processed an event is fired. Again the specific + codec detects the compete processing of a unit of substrate so it + knows when to fire this event. + </p> + + <p> + Going back to our approach for defining a stateful codec, we modeled + the event as a callback to a specific interface. For decoders this + would be a <code>DecoderCallback.decodeOccurred()</code> and for + encoders it would be an <code>EncoderCallback.encodeOccurred()</code> + method call. These interface methods are called when an entire unit + of substrate is respectively decoded or encoded. + </p> + + <p> + This approach also allows for codec chaining in a pipeline where + codecs may be stacked on top of one another. The callback interfaces + are used to bridge together codecs by feeding the output of one codec + operation into the input of another. Specific classes have been + included in the API to accomodate this usage pattern. + </p> + + <center> + <img src="../images/all-uml.gif"/> + </center> + + </subsection> + + <subsection name="StatefulDecoder Usage"> + <p> + StatefulDecoders use callbacks to notify the successful decode of a + unit of encoded substrate. Other than this, the definition of what a + 'unit of encoded substrate' is, depends on the codec's decoder + implementation. The definition may be size constrained or be a + function of context. + </p> + + <p> + Basically you give a decoder some of the substrate every so often + as more of the substrate is made available, then when a unit of + encoded substrate has been decoded, the decoder notifies those + concerned by invoking the callback. + </p> + + <p> + A demonstration of how a StatefulDecoder works is illustrated below: + </p> + + <source> +StatefulDecoder decoder = new SomeConcreteDecoder( 512 ) ; +DecoderCallback cb = new DecoderCallback() { + decodeOccurred( StatefulDecoder decoder, Object decoded ) { + // do something with the decoded object + } +}; +decoder.setCallback( cb ) ; + </source> + + <p> + The StatefulDecoder uses a callback to deliver decoded objects which + are the decoded 'unit of encoded substrate'. StatefulDecoders are ideal + for use in high performance servers based on non-blocking IO. Often + StatefulDecoders will be used with a Selector in a loop to detect input + as it is made available. As the substrate arrives, it is be fed to + the decoder intermittantly. Finally the callback delivers the decoded + units of encoded substrate. Below there is a trivialized example of + how a StatefulDecoder can be used to decoded the substrate as it + arrives fragmented by the tcp/ip stack: + </p> + + <source> +while ( true ) { + ... + SelectionKey key = ( SelectionKey ) list.next() ; + if ( key.isReadable() ) { + SocketChannel channel = ( SocketChannel ) l_key.channel() ; + channel.read( buf ) ; + buf.flip() ; + decoder.decode( buf ) ; + } + ... +} + </source> + + <p> + As you can see from the code fragment the decode() returns nothing + since it has a void return type. Because the callback is used to + deliver the finished product when it is ready, the decode operation + can occur asynchronously in another thread or stage of a server if + desired. + </p> + </subsection> + + <subsection name="Strengths and Weaknesses"> + <p> + As can be seen from the section above and some of the characteristics + of StatefulDecoders, they are ideal for building network servers. These + decoders waste very little memory per request, cannot be overloaded by + massive requests which may be used for DoS attacks, and they process the + substrate as it arrives in chucks instead of in one prolonged CPU and + memory intensive step. + </p> + + <p> + Servers with a high degree of concurrency need to keep overheads low. + StatefulDecoders certainly help achieve that end by keeping the + active processing footprint low with a constant size regardless of the + size of the substrate. + </p> + + <p> + The cost of creating a decoder for every new connection is usually + very minimal however we cannot forsee every possible implementation. + Regardless of the cost associated with dedicating a StatefulDecoder + to each new connection, stateful protocol servers will often benefit + most, as opposed to a stateless server. The reasoning is as follows: + the longer the life of the connection, the more worth while it + is to create a StatefulDecoder and thereby have it amortize over the + life of the connection. + </p> + + <p> + The primary drawback is that StatefulDecoders are much more complex to + implement. They are basically state driven automata which change + their state with the arrival of data. Furthermoe it is very difficult + for StatefulDecoders to gracefully recover from corrupt or lost input. + </p> + </subsection> + + <subsection name="StatefulDecoder Chaining/Stacking"> + <p> + StatefulDecoders can easily be chained or stacked to operate on a + substrate stream. This is achieved by having the callback of one + decoder feed the <code>decode(Object)</code> method of another. Hence + the decoded byproduct of one decoder is the encoded substrate of + another. + </p> + + <p> + Because the occurence of chaining may be common and several folks have + already expressed their interest in it, we have devised a special + StatefulDecoder implementation called a DecoderStack. It itself is + a decoder however other decoders can be pushed onto it. When empty + without any decoders in the stack it operates in pass-thro mode. The + decode operation is basically the identity transformation. When + StatefulDecoders are pushed, decode operations invoke a chain of + decoders starting with the bottom most in the stack going up to the + top. The final callback invoked is the callback registered with the + DecoderStack. + </p> + + <p> + Below is an example of how this DecoderStack is used. The example is + taken from one of the JUnit test cases for DecoderStack: + </p> + + <source> +public void testDecode() { + DecoderStack stack = new DecoderStack() ; + CallbackHistory history = new CallbackHistory() ; + stack.setCallback( history ) ; + stack.push( decoder ) ; + stack.decode( new Integer(0) ) ; + assertEquals( new Integer(0), history.getMostRecent() ) ; + + stack.push( new IncrementingDecoder() ) ; + stack.decode( new Integer(0) ) ; + assertEquals( new Integer(1), history.getMostRecent() ) ; + + stack.push( new IncrementingDecoder() ) ; + stack.decode( new Integer(0) ) ; + assertEquals( new Integer(2), history.getMostRecent() ) ; +} +... + +class IncrementingDecoder extends AbstractStatefulDecoder +{ + public void decode( Object encoded ) throws DecoderException + { + Integer value = ( Integer ) encoded ; + value = new Integer( value.intValue() + 1 ) ; + super.decodeOccurred( value ) ; + } +} + </source> + </subsection> + + <subsection name="Recommendations to Implementors"> + <p> + Keep it simple and rely on chaining to divide and concur complex + decoders into several trivial decoders. Besides simple chaining, + situations will warrent the use of a choice driven decoder. Such a + decoder chooses which subordinate decoder to use based on its + current state. For example in the simple BER byte stream to TLV + decoder in Snickers, their is a TagDecoder, a LengthDecoder and + several Value decoders that are swapped in and out when the top + BERDecoder switches state or detects a new primitive datatype. + </p> + + <p> + When reading encoded data from buffers, keep in mind that there are + 5 different possible configurations to the contents of arriving data + with respect to the unit of encoded substrate: + </p> + + <!-- + todo add illustrations using images here - its not that hard + might want to turn this into a table instead of a ul if we decide + to do that + --> + + <ul> + <li> + it contains a single complete discrete unit of encoded substrate + </li> + <li> + it contains many discrete and complete units of encoded substrate + </li> + <li> + it contains a partial fragment of a unit of encoded substrate + </li> + <li> + it contains two partial fragments of a unit of encoded substrate with + the start of one and the end of another + </li> + <li> + it contains one or more fragments with one or more units of encoded + substrate + </li> + </ul> + + <p> + When fragments arrive they are either head or tail fragments. Head + fragments are those that start a unit and they are found at the end + of the buffer. Tail fragments end a unit of encoded substrate and are + found at the front of the buffer. + </p> + </subsection> + </section> + </body> +</document> Added: incubator/directory/asn1/branches/rewrite/stub-compiler/xdocs/index.xml URL: http://svn.apache.org/viewcvs/incubator/directory/asn1/branches/rewrite/stub-compiler/xdocs/index.xml?view=auto&rev=154725 ============================================================================== --- incubator/directory/asn1/branches/rewrite/stub-compiler/xdocs/index.xml (added) +++ incubator/directory/asn1/branches/rewrite/stub-compiler/xdocs/index.xml Mon Feb 21 13:44:46 2005 @@ -0,0 +1,14 @@ +<?xml version="1.0" encoding="UTF-8"?> +<document> + <properties> + <author email="[EMAIL PROTECTED]">Alex Karasulu</author> + <title>Snickers ASN.1 Java Stub Compiler</title> + </properties> + <body> + <section name="Coming soon ..."> + <p> + Wonderful things are coming soon ... + </p> + </section> + </body> +</document>
