Author: akarasulu Date: Sat Jan 15 13:00:52 2005 New Revision: 125307 URL: http://svn.apache.org/viewcvs?view=rev&rev=125307 Log: updates to site with docs on how we are going to refactor asn1 to make it better and faster Added: incubator/directory/asn1/trunk/xdocs/refactor.xml Modified: incubator/directory/asn1/trunk/xdocs/index.xml
Modified: incubator/directory/asn1/trunk/xdocs/index.xml Url: http://svn.apache.org/viewcvs/incubator/directory/asn1/trunk/xdocs/index.xml?view=diff&rev=125307&p1=incubator/directory/asn1/trunk/xdocs/index.xml&r1=125306&p2=incubator/directory/asn1/trunk/xdocs/index.xml&r2=125307 ============================================================================== --- incubator/directory/asn1/trunk/xdocs/index.xml (original) +++ incubator/directory/asn1/trunk/xdocs/index.xml Sat Jan 15 13:00:52 2005 @@ -45,6 +45,14 @@ </td> </tr> </table> + + <subsection name="Refactoring in 0.3 Branch"> + <p> + For the next dev cycle we're radically refactoring the structure of + these modules. For more information you can take a look + <a href="./refactor.html">here</a>. + </p> + </subsection> </section> <section name="Motivation"> Added: incubator/directory/asn1/trunk/xdocs/refactor.xml Url: http://svn.apache.org/viewcvs/incubator/directory/asn1/trunk/xdocs/refactor.xml?view=auto&rev=125307 ============================================================================== --- (empty file) +++ incubator/directory/asn1/trunk/xdocs/refactor.xml Sat Jan 15 13:00:52 2005 @@ -0,0 +1,591 @@ +<?xml version="1.0" encoding="UTF-8"?> +<document> + <properties> + <author email="[EMAIL PROTECTED]">Alex Karasulu</author> + <title>Refactoring the ASN.1 Runtime</title> + </properties> + + <body> + <section name="Refactoring the ASN.1 Runtime"> + <p> + The use of Snacc4J as the runtime ASN.1 BER codec for LDAP impossed an + IP issue for the new Directory Project under incubation. This resulted + in the creation of our own implementation, and hence the Apache ASN.1 + Runtime library was created. + </p> + + <p> + Before continuing any further it might be a good idea to read about + the existing architecture to understand the changes that are being + proposed. + </p> + + <subsection name="High Level Goals and Changes"> + <p> + The internal 0.2 release was the first successful attempt to produce a + replacement for Snacc4J. As of release 0.8 of ApacheDS it provides + BER encoders and decoders for LDAP requests and responses. The library + was designed with performance in mind. Some very good ideas were + introduced and really put to the test. However the library does have + performance problems. The designs to make this into a high performance + library were not totally followed through. Furthermore the code base + is very difficult to maintain needing some reorganization. We hope to + refactor the library so it is more efficient, and easier to maintain + while reducing the number of dependencies it has. In the process we + would like to introduce some new features and improvements which are + listed below: + </p> + + <ul> + <li> + Better ByteBuffer utilization by splicing buffers instead of copying + them. + </li> + + <li> + Repace current Tuple class with well defined Tuple interfaces: + specifically we need to remove TLV field processing from a Tuple + as well as tag cooking functionality. Tag cooking refers to the + application of transformations that turn tag bytes into a 4 byte + Java primitive integers. These functions need to be localized + within utility classes. + </li> + + <li> + Some BER based protocols only use a subset of the encoding rules. + For example LDAP only uses definite length encodings for constructed + tuples. A reduced set of rules are much easier to code, maintain, + and often will perform significantly better than codecs designed for + the entire rule set. The key here however is to make sure that + the core of the codec can be replaced transparently without imposing + code changes. + </li> + + <li> + The Tuples of primitives like binary values store the Tag, Length + and Value of the primitive TLV Tuple in memory. Sometimes primitive + values can be dangerously large for a server to encode or decode. + Primitive tuples could be blobs of large binaries like images. If + tuple values are larger than some application defined limit they + aught to be streamed to disk rather than kept in main memory. + Streaming to disk makes the server more efficient overall since it + can maintain a constant sized decoding footprint. However switching + to disk based storage will rightfully slow down the current operation + which involves a large primitive. This is a tradeoff that should + be configurable by API users and ultimately ApacheDS administrators. + </li> + + <li> + Better logging and error handling for codecs with pershaps some + management interfaces to control the properties of codecs. + </li> + + <li> + A single deployable artifact where the ber and codec jars are fused. + </li> + + <li> + Make the code easier to maintain while improving its structure. + </li> + </ul> + </subsection> + + + </section> + + <section name="Tuple Interface/Class Hierarchies"> + <p> + Presently Tuples contain the functionality to decode and encode + fields. Tuples can even encode themselves to a buffer as BER or + DER. A Tuple is not a simple bean and that's all that it should be. + Hence one of our goals is to factor out this additional functionality. + </p> + + <p> + A Tuple is a single class that acts more like a union of different + types rather than using inheritance to differentiate. There are + distinct types of tuples, constructed verses primitive for example. + Instead of using complex logic to differentiate what kind of Tuple an + instance is it is much better to differentiate the Tuple into + subclasses. Hence we propose a new interface and implementation + hierarchy for Tuples. + </p> + + <p> + Let's start by proposing a minimal Tuple interface. + </p> + +<source> +interface Tuple +{ + /** + * Gets the zero based index into a PDU where the first byte of this + * Tuple's tag resides. + * + * @return zero based index of Tag's first byte in the PDU + */ + int getTagStartIndex(); + + /** + * Gets this TLV Tuple's Tag (T) as a type safe enumeration. + * + * @return type safe enumeration for the Tag + */ + TagEnum getTag(); + + /** + * Gets whether or not this Tuple is constructed. + * + * @return true if the Tag is constructed false if it is primitive. + */ + boolean isConstructed(); +} +</source> + + <p> + These interfaces give the minimum information needed for a Tuple + that is not specific to another specialized type of Tuple. Meaning + all Tuples share these methods. We can also go a step further and + implement an AbstractTuple where protected members are used to + implement these methods. Note that isConstructed() will probably be + left abstract so subclasses can just return true or false. For + brevity this code is not shown but other classes in the section below + will extend from AbstractTuple. + </p> + + <subsection name="Primitive Vs. Constructed Tuples"> + <p> + We need to go a step further and start differentiating between Tuples + that are primitive and those that are constructed. In this step we + introduce two new abstract classes PrimitiveTuple and + ConstructedTuple. + </p> + + <p> + These two classes will be described below but one might ask why both + are still abstract. This is because we need to differentiate further + for buffered verses streamed Tuples in the case of primitive Tuples. + For constructed Tuples we need to differentiate between definate + length verses indefinate length Tuples. With our approach, only the + leaf nodes of the inheritance hierarchy will be concrete. Below is + the definition for the PrimitiveTuple. + </p> + +<source> +public abstract class PrimitiveTuple extends AbstractTuple +{ + /** the number of bytes used to compose the Tuple's length field */ + protected int lengthFieldSz = 0; + /** the number of bytes used to compose the Tuple's value field */ + protected int valueFieldSz = 0; + + ... + + public final boolean isConstructed() + { + return false; + } + + /** + * Gets whether or not this Tuple's value is buffered in memory or + * streamed to disk. + * + * @return true if the value is buffered in memory, false if it is streamed + * to disk + */ + public abstract boolean isBuffered(); + + /** + * Gets the number of bytes in the length (L) field of this TLV Tuple. + * + * @return number of bytes for the length + */ + public final int getLengthFieldSize() + { + return lengthFieldSz; + } + + /** + * Gets the number of bytes in the value (V) field of this TLV Tuple. + * + * @return number of bytes for the value + */ + public final int getValueFieldSize(); + { + return valueFieldSz; + } + + ... +} +</source> + <p> + This abstract class adds two new concrete methods for tracking the + size of the length and value fields. Constructed Tuples may not + necessarily have a length value associated with them if they are + of the indeterminate form. Furthermore the value of constructed + Tuples are the nested child Tuples subordinate to them. So there + is no need to track the value prematurely now for anything other + than primitive Tuples. + </p> + + <p> + Note that the isBuffered() method is implemented as final and always + returns false for this lineage of Tuples. A final modifier on the + method makes sense and sometimes helps the compiler inline this + method so we don't always pay a price for using it in addition to + subclassing. A new abstract method isBuffered() is introduced which + is discussed in detail within the Buffered Vs. Streamed section. + </p> + + <p> + Now let's take a look at the ConstructedTuple abstract class. + </p> + +<source> +public abstract class ConstructedTuple extends AbstractTuple +{ + public final boolean isConstructed() + { + return true; + } + + /** + * Gets whether or not the length of this constructed Tuple is of the + * definate form or of the indefinate length form. + * + * @return true if the length is definate, false if the length is of the + * indefinate form + */ + public abstract boolean isLengthDefinate(); +} +</source> + + <p> + ConstructedTuple implements the <code>isConstructed()</code> method + as final since it will always return false for this lineage of + Tuples. Also a new abstract method isLengthDefinate() is introduced + to see if the Tuple uses the indefinate length form or not. + </p> + </subsection> + + <subsection name="Definate Vs. Indefinate Length"> + <p> + The ConstructedTuple can be further differentiated into two + subclasses to represent definate and indefinate length constructed + TLV Tuples. The indefinate form does not have a length value + associated with it where as the definate lenght form does. Let's + explore the concrete IndefinateLegthTuple definition. + </p> + +<source> +public class IndefinateLength extends ConstructedTuple +{ + public final boolean isLengthDefinate() + { + return false; + } +} +</source> + + <p> + Yep this is pretty simple. There is very little to track for this + Tuple since most of the tracking is handled by its decendent Tuples. + The class also is concrete. What about the DefinateLength + implementation ... + </p> + +<source> +public class DefinateLength extends ConstructedTuple +{ + /** the number of bytes used to compose the Tuple's length field */ + protected int lengthFieldSz = 0; + /** the number of bytes used to compose the Tuple's value field */ + protected int valueFieldSz = 0; + + ... + + public final boolean isLengthDefinate() + { + return true; + } + + /** + * Gets the number of bytes in the length (L) field of this TLV Tuple. + * + * @return number of bytes for the length + */ + public final int getLengthFieldSize() + { + return lengthFieldSz; + } + + /** + * Gets the number of bytes in the value (V) field of this TLV Tuple. + * + * @return number of bytes for the value + */ + public final int getValueFieldSize(); + { + return valueFieldSz; + } +} +</source> + <p> + Now this introduces two new concrete methods for getting the length + of the length field and the length of the value field. A determinate + length TLV has a valid value within the Length (L) field. The value + of the length field is the length of the value field. Hence the + reason why we include both these concrete methods. + </p> + </subsection> + + <subsection name="Buffered Vs. Streamed PrimitiveTuples"> + <p> + As we mentioned before, there are two kinds of primitive Tuples. + Those that keep there value in a buffer within the TLV Tuple object, + in which case it is buffered within memory, and those that stream + the value to disk and store a referral to the value on disk. These + two beasts are so different it makes sense to differentiate between + them using subclasses. Let's take a look at a BufferedTuple which + is the simplest one. + </p> + +<source> +public class BufferedTuple extends PrimitiveTuple +{ + /** contains ByteBuffers which contain parts of the value */ + private final ArrayList value = new ArrayList(); + /** pre-fab final unmodifiable wrapper around our modifiable list */ + private final List unmodifiable = Collections.unmodifiableList( value ); + + public final boolean isBuffered() + { + return true; + } + + /** + * Gets the value of this Tuple as a List of ByteBuffers. + * + * @return a list of ByteBuffers containing parts of the value + */ + public final List getValue() + { + return unmodifiable; + } +} +</source> + + <p> + The implementation introduces a final <code>getValue()</code> method + which returns an unmodifiable wrapper around a modifiable list of + ByteBuffers. The <code>isBuffered()</code> method is made final and + implemented to return true all the time. This is easy so let's now + take a look at the StreamedTuple implementation. + </p> + +<source> +public abstract class StreamedTuple extends PrimitiveTuple +{ + public final boolean isBuffered() + { + return false; + } + + // might experiment with a getURL to represent the source of + // the data stream - we need to discuss this on the list + + /** + * Depending on the backing store used for accessing streamed data there + * may need to be multiple subclasses that implement this method. + * + * @return an InputStream that can be used to read this Tuple's streamed + * value data + */ + public abstract InputStream getValueStream(); + + // another question is whether or not to offer a readable Channel instead + // of an InputStream? This is another topic for discussion. +} +</source> + + <p> + At this point we know that there could be multiple ways to implement + this kind of StreamedTuple. Notice though the value is accessed + through a stream provided by the Tuple. This way the large value + stored on disk need not all be kept in memory at one time during the + decode or encode process. + </p> + + </subsection> + + <p> + Some code will be removed from the Tuple class today during the + refactoring and kept in a TupleUtils class. Functionality like + the encoding, decoding of Tuple fields and tag cooking can be + offloaded to this class. + </p> + </section> + + <section name="notes"> + <p> + By far the largest part of the refactoring effort is in introducing + this new hierarchy and introducing some patterns that improve the + maintainability of the code like the State pattern. Other minor + details for this dev cycle are discussed below. + </p> + + <subsection name="Termination Tuples"> + <p> + A lot of effort is made to track the position of a Tuple within a + PDU. This is why we have methods like getTagStartIndex(). We want + to know where the first byte of a Tuple's tag is within a PDU. This + positional accounting enables better error reporting when problems + result. They also allow us to detect when we start and stop + processing a PDU. + </p> + + <p> + The minimum amount of information needed to track the position of a + Tuple within a PDU or the start and stop points of a PDU is to have + the Tuple's tag start index, and the lengths of fields within the + Tuple. + </p> + + <p> + In a decoder for example we know that we've processed the last + topmost Tuple of a PDU when we get a Tuple whose <code> + getTagStartIndex()</code> returns 0. <b>WARNING</b>: AbstractTuple + should default the value the start tag index to -1 so it cannot + be interpretted as a terminator. + </p> + </subsection> + + <subsection name="New Coherent Replacement for Stateful Codec API"> + <p> + There have been many complaints about the codec API being too + generic or the callback mechanism being somewhat unintuitive. + Perhaps we can work on more specific interfaces which incorporate + the concepts of producer and consumer. Plus let's see if we can + make these interfaces specific so we don't have ugly codes and casts + all over the place. + </p> + + <p> + Also in the end we want to do away with this codec API which was + originally intended to fuse back into commons. I've abandoned this + idea because it is too difficult to make all parties happy. The + best thing to do is create our own interface that fit well and + enable them to be wrapped for other APIs. Hence going towards custom + codec API's is not an issue. The old codec stuff can be pushed into + the protocol framework API. + </p> + + <p> + Furthermore at the end of the day we want there to be a single runtime + jar without any dependencies for the ASN.1 stuff. That means no more + codec API as it is with jar today within the ASN.1 project. + </p> + + <p> + Some new producer consumer interface ideas are listed below: + </p> + + <ul> + <li> + BufferConsumer: consumes ByteBuffers. Something like <code>void + consume(ByteBuffer bb)</code> comes to mind. Perhaps even with + overloads to take a list or array of BBs. + </li> + + <li> + TupleProducer: generates Tuples (often is a BufferConsumer). Some + thing like <code>void setConsumer(TupleConsumer consumer)</code> + comes to mind. + </li> + + <li> + TupleConsumer: consumes Tuples generated by a TupleProducer. + Something like <code>void consume(Tuple tlv)</code> comes to mind. + </li> + + <li> + MessageProducer: produces populated message stubs + </li> + </ul> + </subsection> + + <subsection name="Possibly Merging TupleNode and Tuple"> + <p> + Right now to build Tuple trees we use yet another class to wrap + Tuples called TupleNodes. This kept the contents of the Tuple + class less conjested. The Tuple class will no longer exist and the + conjestion issues is no longer valid. The question now is, is it + worth keeping parent child methods in TupleNode when creating trees + while paying for extra object creation? + </p> + + <p> + Note that the TupleNode methods are not required on Tuple to process + a byte stream of encoded TLV data in a sax-like fashion. These + methods are only required for higher level operations like visitations + from visitors during the encoding process. The question really is + whether we will make Tuple impure to save a little time so we don't + have to create TupleNode objects to wrap Tuples and model the + hierarchy? This is something that needs to be discussed. + </p> + + <p> + Contrary to the purist approach of keeping Tuple and TupleNode + separate one can merge the two. A codec need not honor these methods + by building the tree. Meaning these tree node (TupleNode) methods + may simply return null. If these methods are honored then it is the + intent of the codec to build a tree. If the tree is built the + processing is more like DOM and if not then it is more like SAX. We + should not tax the DOM like processing use case by forcing the need + to create extra wrappers, while accomodating the purist view. + </p> + </subsection> + + <subsection name="Removing the Digester Concept"> + <p> + I don't know what I was thinking when I devised this rule based + approach similar to the Digester in commons. This was a big mistake + and IMO one of the reasons why we have performance issues. This + datastructure can be removed entirely from upper layers that depend + on it. + </p> + + <p> + Granted this means we are going to have to weave once again our own + classes for handling LDAP specific PDU's however I think this will be + easy to do. I will essentially rewrite the LDAP provider based on + our runtime to hardcode the switching rather than using this rule + based triggering approach. The new approach is also going to + simplify the code significantly making it more maintainable. + Hopefully these changes will also speed up the code since less + objects will need to be created every time a decoder is instantiated. + </p> + </subsection> + + <subsection name="It's Time For DER and CER"> + <p> + We need to find a way to make the rules used while decoding and + encoding Tuples plugable. This way we can change the rules to + encode as generic BER, reduced BER (for increases in performance + in the case of specific protocol needs). DER likewise is a reduced + set of BER with restrictions on the encoding and range of values + that can be interpreted from primitive values. If the plugability + is there the runtime is a flexible TLV Tuple codec that can change + the rules use to handle the substrate. + </p> + + <p> + We could easily have BerDecoder, CerDecoder and even protocol specific + decoders with those BER rules used by a protocol such as + LdapBerDecoder for those BER decoding rules that only apply to LDAP. + </p> + </subsection> + + </section> + </body> +</document>
