Date: 2005-02-16T03:24:24
   Editor: EmmanuelLecharny
   Wiki: Apache Directory Project Wiki
   Page: TLVPageInfo
   URL: http://wiki.apache.org/directory/TLVPageInfo

   no comment

Change Log:

------------------------------------------------------------------------------
@@ -8,14 +8,37 @@
  * The '''Length''' part may not give the '''Value''' length : it is called an 
indefinite '''Length'''. Whatever, in this - not so frequent - case, the 
'''Value''' must end with a specific terminator.
 
 === A quick sample ===
-Let's begin with a simple example, without too many explanations :
+Let's begin with a simple example, without too many explanations. This is the 
'''PDU''' ('''P'''acket '''D'''ata '''U'''nit) of a '''BindRequest''' :
 
 attachment:TLVs.png
 
-We can see in this picture that you have what I called a first level TLV. It 
encapsulates other TLVs. 
+We can see in this picture that you have what I called a first level TLV. It 
encapsulates other TLVs. It's basically a stream of bytes.
 
 ==== Tag ====
-The '''Tag''' element value is 0x30, we will see later its meaning. 
+Each '''Tag''' contains information about the '''Value''' part of the 
'''TLV'''. It tells if the '''Value''' is a primitive or a constructed one, 
which type of '''primitive''' is the value, gives some contextual information. 
A '''Tag''' can coded on more than one byte. The first 3 bits give some 
contextual information about the tag, and the 5 following bits are either a 
label or the beginning of a multi-bytes label.
+
+Labels are numbers used to identify elements in a '''SET''' (see Asn.1 
grammar), for instance. Generally, we don't have to deal with label above 30, 
which can be encoded in 5 bits (so this kind of '''Tag''' will be 1 byte long), 
and never above 1024. In the LDAP ASN.1 grammar, no label exceed 19 (in 
''LdapMessage'', the ''ExtendedResponse'' label is 19), so we can focus on 1 
byte tags. Whatever, it could be interesting to accept longer labels to be able 
to support any LDAP evolution (or other protocols, as this '''Tag''' decoder is 
not specifically written for LDAP)
+
+Decoding a Tag has to follow the finite state automaton showed on this picture 
:
+
+attachment:TagStateAutomaton.png
+
+(Thanks to Poseidon [http://www.gentleware.com/] or Argo UML 
[http://argouml.tigris.org/])
+
+In this diagram, ''bb'' stands for ''ByteBuffer''. It contains the stream of 
bytes to be decoded.
+
+Other interesting information that we need to grab from a '''Tag''' are stored 
in the two first bits (bit 7 and 6), and in the third bit (bit 5). The first 
two describe the class, the third tells if the '''TLV''' is a ''primitive'' (b5 
= 0) or a ''constructed'' '''TLV''' (b5 = 1).
+
+As we can see, we have to deal with the special case where the stream does not 
contain enough bytes to decode a multi-byte '''Tag'''. In this case, the 
automaton will exit with a state ''TAG_PENDING''. So the state automaton has 
two different start state : ''TAG_START'' and ''TAG_PENDING''. While the 
''TAG_DONE'' is not reached, we have to keep '''Tag''' data somewhere. There 
are many ways to fulfill this requirement.
+ 1. the '''Tag''' encoder can be instanciated each time a new '''Tag''' is to 
be decoded, and it will store the current state
+ 2. a session can be stored within the decoder, and will be returned back to 
the caller if a ''STATE_PENDING'' state is reached. The caller will have to 
give back this session to the decoder in order to finish the decoding.
+ 3. the caller may have to create a container and pass it as a parameter to 
the decoder. The decoder will store the current state in this container.
+
+The second option is of no help in this simple case. It's too complicated, and 
will be much slower than any of the two others options. We have to keep in mind 
that 99% of the '''Tag''' will be contained in one byte, and the probability 
that the stream stops just in the middle of a '''Tag''', even if not equal to 
zero, is very low. So we have to keep the decoding process simple (KISS : 
http://digital-web.com/articles/keep_it_simple_stupid). 
+
+I don't like the idea of instanciating new decoders when a new '''Tag''' 
arrives. We have to separate action and data. 
+
+So it leads to the third solution : calling the unique decoder with a 
container. It's quite easy to implement.
 
 ==== Length ====
 The '''Length''' value is 0x0C, which is 12 is base 10. If you count the bytes 
after this '''Length''', you can easily see that there are 12 bytes. Ok, so 
'''Length''' means the number of bytes that contains the '''TLV'''. You can 
check for other '''Length''' that it matches. Good !
@@ -26,7 +49,7 @@
 
 ==== Value ====
 
-What about the '''Values'''? '''Length''' was easy, it was totally 
context-free. Which kind of '''Value''' can ve have? How do we know the type of 
each '''Value'''?
+What about the '''Values'''? '''Length''' was easy, it was totally 
context-free. Which kind of '''Value''' can we have? How do we know the type of 
each '''Value'''?
 
 First, we have seen that some '''Values''' are composed with '''TLVs'''. But 
we must have some kind of primitive '''Values''', like ''integer'' or 
''string''?
 
@@ -34,16 +57,16 @@
 
 So, let's see other '''Tags''' : 04 code for an ''Octet String''. Here, we 
have two empty strings : ''Octet String'' (04) zero '''Length''' (00) in the 
two last '''TLVs'''
 
-0A (forth '''TLV''') means ''Enumerated''. This is a way to code a constrained 
value (i.e something in a set of values). Here, it's a 0 : ''An enumerated (0A) 
value which is 1 byte long (01) and which value is 0 (00)''. It does not give 
you a lot of information, as you can see: which kind of value is it suppose to 
be? 
+0A (fourth '''TLV''') means ''Enumerated''. This is a way to code a 
constrained value (i.e something in a set of values). Here, it's a 0 : ''An 
enumerated (0A) value which is 1 byte long (01) and which value is 0 (00)''. It 
does not give you a lot of information, as you can see: which kind of value is 
it suppose to be? 
 
-So far, so good, we have a kind of way to decode simple '''TLV'''. Let's call 
them '''Primitive'''. What about '''TLVs''' taht contains other '''TLVs'''? We 
will call them '''Constructed'''
+So far, so good, we have a kind of way to decode simple '''TLV'''. Let's call 
them '''Primitive'''. What about '''TLVs''' that contains other '''TLVs'''? We 
will call them '''Constructed'''
 
 The first '''TLV''' has a '''Tag''' value of 30. This is a ''SEQUENCE'' of 
'''TLVs'''. A ''SEQUENCE'' is constructed by ordered '''TLVs'''. We can't 
exchange two '''TLVs''' in a ''SEQUENCE'', there is another '''Tag''' for that 
: a ''SET''. 
 
-The last '''TLV''' has a '''Tag''' value of 61. This is specific of a 
'''CHOICE''', where you have to choose between different cases, and here it's 
the first value that has been choosen (we can read 61 has a '''SEQUENCE''' 
number 1 of the alternative. Accept the explanation, it's quite complicated to 
give the reason why 61 is a '''SEQUENCE''' while 30 is also a '''SEQUENCE''').
+The last '''TLV''' has a '''Tag''' value of 61. This is specific of a 
'''CHOICE''', where you have to choose between different cases, and here it's 
the first value that has been chosen (we can read 61 has a '''SEQUENCE''' 
number 1 of the alternative. Accept the explanation, it's quite complicated to 
give the reason why 61 is a '''SEQUENCE''' while 30 is also a '''SEQUENCE''').
 
 
-For any further information, one should read 
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf] which 
explain this encoding, but be aware that you also need to read 
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf]. They are 
availaible for free, which is quite cheap compared to sleeping pills !
+For any further information, one should read 
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf] which 
explain this encoding, but be aware that you also need to read 
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf]. They are 
available for free, which is quite cheap compared to sleeping pills !
 
 === Decoding TLVs ===
 

Reply via email to