Date: 2005-02-16T03:24:24
Editor: EmmanuelLecharny
Wiki: Apache Directory Project Wiki
Page: TLVPageInfo
URL: http://wiki.apache.org/directory/TLVPageInfo
no comment
Change Log:
------------------------------------------------------------------------------
@@ -8,14 +8,37 @@
* The '''Length''' part may not give the '''Value''' length : it is called an
indefinite '''Length'''. Whatever, in this - not so frequent - case, the
'''Value''' must end with a specific terminator.
=== A quick sample ===
-Let's begin with a simple example, without too many explanations :
+Let's begin with a simple example, without too many explanations. This is the
'''PDU''' ('''P'''acket '''D'''ata '''U'''nit) of a '''BindRequest''' :
attachment:TLVs.png
-We can see in this picture that you have what I called a first level TLV. It
encapsulates other TLVs.
+We can see in this picture that you have what I called a first level TLV. It
encapsulates other TLVs. It's basically a stream of bytes.
==== Tag ====
-The '''Tag''' element value is 0x30, we will see later its meaning.
+Each '''Tag''' contains information about the '''Value''' part of the
'''TLV'''. It tells if the '''Value''' is a primitive or a constructed one,
which type of '''primitive''' is the value, gives some contextual information.
A '''Tag''' can coded on more than one byte. The first 3 bits give some
contextual information about the tag, and the 5 following bits are either a
label or the beginning of a multi-bytes label.
+
+Labels are numbers used to identify elements in a '''SET''' (see Asn.1
grammar), for instance. Generally, we don't have to deal with label above 30,
which can be encoded in 5 bits (so this kind of '''Tag''' will be 1 byte long),
and never above 1024. In the LDAP ASN.1 grammar, no label exceed 19 (in
''LdapMessage'', the ''ExtendedResponse'' label is 19), so we can focus on 1
byte tags. Whatever, it could be interesting to accept longer labels to be able
to support any LDAP evolution (or other protocols, as this '''Tag''' decoder is
not specifically written for LDAP)
+
+Decoding a Tag has to follow the finite state automaton showed on this picture
:
+
+attachment:TagStateAutomaton.png
+
+(Thanks to Poseidon [http://www.gentleware.com/] or Argo UML
[http://argouml.tigris.org/])
+
+In this diagram, ''bb'' stands for ''ByteBuffer''. It contains the stream of
bytes to be decoded.
+
+Other interesting information that we need to grab from a '''Tag''' are stored
in the two first bits (bit 7 and 6), and in the third bit (bit 5). The first
two describe the class, the third tells if the '''TLV''' is a ''primitive'' (b5
= 0) or a ''constructed'' '''TLV''' (b5 = 1).
+
+As we can see, we have to deal with the special case where the stream does not
contain enough bytes to decode a multi-byte '''Tag'''. In this case, the
automaton will exit with a state ''TAG_PENDING''. So the state automaton has
two different start state : ''TAG_START'' and ''TAG_PENDING''. While the
''TAG_DONE'' is not reached, we have to keep '''Tag''' data somewhere. There
are many ways to fulfill this requirement.
+ 1. the '''Tag''' encoder can be instanciated each time a new '''Tag''' is to
be decoded, and it will store the current state
+ 2. a session can be stored within the decoder, and will be returned back to
the caller if a ''STATE_PENDING'' state is reached. The caller will have to
give back this session to the decoder in order to finish the decoding.
+ 3. the caller may have to create a container and pass it as a parameter to
the decoder. The decoder will store the current state in this container.
+
+The second option is of no help in this simple case. It's too complicated, and
will be much slower than any of the two others options. We have to keep in mind
that 99% of the '''Tag''' will be contained in one byte, and the probability
that the stream stops just in the middle of a '''Tag''', even if not equal to
zero, is very low. So we have to keep the decoding process simple (KISS :
http://digital-web.com/articles/keep_it_simple_stupid).
+
+I don't like the idea of instanciating new decoders when a new '''Tag'''
arrives. We have to separate action and data.
+
+So it leads to the third solution : calling the unique decoder with a
container. It's quite easy to implement.
==== Length ====
The '''Length''' value is 0x0C, which is 12 is base 10. If you count the bytes
after this '''Length''', you can easily see that there are 12 bytes. Ok, so
'''Length''' means the number of bytes that contains the '''TLV'''. You can
check for other '''Length''' that it matches. Good !
@@ -26,7 +49,7 @@
==== Value ====
-What about the '''Values'''? '''Length''' was easy, it was totally
context-free. Which kind of '''Value''' can ve have? How do we know the type of
each '''Value'''?
+What about the '''Values'''? '''Length''' was easy, it was totally
context-free. Which kind of '''Value''' can we have? How do we know the type of
each '''Value'''?
First, we have seen that some '''Values''' are composed with '''TLVs'''. But
we must have some kind of primitive '''Values''', like ''integer'' or
''string''?
@@ -34,16 +57,16 @@
So, let's see other '''Tags''' : 04 code for an ''Octet String''. Here, we
have two empty strings : ''Octet String'' (04) zero '''Length''' (00) in the
two last '''TLVs'''
-0A (forth '''TLV''') means ''Enumerated''. This is a way to code a constrained
value (i.e something in a set of values). Here, it's a 0 : ''An enumerated (0A)
value which is 1 byte long (01) and which value is 0 (00)''. It does not give
you a lot of information, as you can see: which kind of value is it suppose to
be?
+0A (fourth '''TLV''') means ''Enumerated''. This is a way to code a
constrained value (i.e something in a set of values). Here, it's a 0 : ''An
enumerated (0A) value which is 1 byte long (01) and which value is 0 (00)''. It
does not give you a lot of information, as you can see: which kind of value is
it suppose to be?
-So far, so good, we have a kind of way to decode simple '''TLV'''. Let's call
them '''Primitive'''. What about '''TLVs''' taht contains other '''TLVs'''? We
will call them '''Constructed'''
+So far, so good, we have a kind of way to decode simple '''TLV'''. Let's call
them '''Primitive'''. What about '''TLVs''' that contains other '''TLVs'''? We
will call them '''Constructed'''
The first '''TLV''' has a '''Tag''' value of 30. This is a ''SEQUENCE'' of
'''TLVs'''. A ''SEQUENCE'' is constructed by ordered '''TLVs'''. We can't
exchange two '''TLVs''' in a ''SEQUENCE'', there is another '''Tag''' for that
: a ''SET''.
-The last '''TLV''' has a '''Tag''' value of 61. This is specific of a
'''CHOICE''', where you have to choose between different cases, and here it's
the first value that has been choosen (we can read 61 has a '''SEQUENCE'''
number 1 of the alternative. Accept the explanation, it's quite complicated to
give the reason why 61 is a '''SEQUENCE''' while 30 is also a '''SEQUENCE''').
+The last '''TLV''' has a '''Tag''' value of 61. This is specific of a
'''CHOICE''', where you have to choose between different cases, and here it's
the first value that has been chosen (we can read 61 has a '''SEQUENCE'''
number 1 of the alternative. Accept the explanation, it's quite complicated to
give the reason why 61 is a '''SEQUENCE''' while 30 is also a '''SEQUENCE''').
-For any further information, one should read
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf] which
explain this encoding, but be aware that you also need to read
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf]. They are
availaible for free, which is quite cheap compared to sleeping pills !
+For any further information, one should read
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf] which
explain this encoding, but be aware that you also need to read
[http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf]. They are
available for free, which is quite cheap compared to sleeping pills !
=== Decoding TLVs ===