ons-huis.net!ALbert
Wed, 31 Mar 2004 11:50:26 -0800
Hello Rainer, Anton, WG
Last weeks I have reviewed some of the current draft documents. I had
planned to spend more time on it, coming month. But, plans are changed.
It will be very busy, which is good .. But also means less tome for syslog.
I have one important "issue": I think it would be a great improvement if
some parts of Rainer's document would move to Anton's. I think the
protocol document should NOT specify how to break messages in parts. It
should assume the transport layer can "transport" 'long messages'.
The transport layer, should when a long message can't be send in once,
spilt in is several parts.
I think it should be possible to write another 'transport' document,
which describes how 'long, new, better' messages can be transported by
rfc-3164 syslog messages.
Note: I'm NOT saying we should make an RFC for this. Only that, if we
could write it, the separation between transport and "upper" layers are
better.
Note 2: That document should describe how to 'shrink' the longer
headers into the old headers, etc. Not by putting the complete message
into the payload.
--------------
Aside from the point above, I will include a raider long list of
(personal) notes 'asis'.
Rainer, please use those notes when relevant, Skip the others. I'm
sorry, I don't have more time to rewrite my notes to a good review. It
is either me to throw away them, of give another that option.:-)
Hope it helps a bit.
===========================================================
--
Groetjes,
--ALbert Mietus
Private mail to: albert at ons-huis dot net
Business mail to: albert dot mietus at PTS dot nl
Spam: Just don't do it! Thrust me, I will not order!
Hello WG,
This is my comment on Rainer's syslog-protcol draft 04. I have been very passive on
following this mailinglist; (to busy:-) But on receiving a request to review I decided
to do so. But I apologise if I raise issues that are discussed already. I didn't read
them. This week I started, after a log time, to study the current draft document(s).
Before continuing, let me say I like the idea of splitting syslog in several layers. I
say so, in the hope the individual sub-specifications will become short.
My comment is split into 3 parts, of which you are reading part 1 now. Part 1 is about
the "idea" (the design). In a separate mail I will comment on the text of the
document; in the hope I will become clearer, cleaner and shorter. The 3rd mail will
contain some details and bit that didn't fit in the others.
Possible, I will send more comment in futures mail; it will depend on the amount of
time I have now, and then.
-------------------------------------------------------
This review is becoming (very) long, I really hope all comments are worth reading. I
have tried to express my thoughts about is a well as possible, given the time I have...
General Comment on the idea/design of syslog-protocol
=====================================================
* This idea is GOOD!
H2 Architecture
================
Traditional, syslog knows Devices, Collectors and Relays.
I would like to add two 'things'. One I would like to call a *Generator* , the other
an *Runner*.
*Generator*
As we can see in e.g. most Unix implementations, it is the application that knows WHAT
to log, transmit that to the system (the log device, the syslogdaemon) that know HOW
to log it. Also on embedded systems, this can be seen. Historically, the combination
of (a part of the) application and the "system" form the log-device.
I would like to split this, now this opportunity exits. The part that is build-in in
the application, (in C: the lines syslog(..."hai there");) can be called the
(LOG-)Generator.
The communication between generator and Device is system/platform depended. On Unix
systems usually the log-device, on embedded systems a function-call, and on windows
log-events can be used.
By introducing this Generator, we (can) make clear this private/dedicated
communication exist; and is allowed. We also make clear the Generator is
syslog-protocol INDEPENDED.
The function of the Device, know becomes clear: it get log-message (-events) and
transport them. It also does some bookkeeping, like timestamping, adding crypto (for
-sign), etc.
*Runner*
A LOG-runner is a other kind of syslog-thing, which is frequently used. Without a
proper place in the architecture. Whereas a relay (should) forwards syslog-messages
without knowledge of the semantics of the message, the Runner does. The most simple
life-form of a runner is a filter. It "relays" messages, but only when the are
important. On Unix there a several of these in perl, grep-scripts etc. Formally, the
are knot relays (I think).
A more complex runner, is a "program" that receives log-messages, CHANGES them, and
send (or stores) the result. Examples: statically analyses, Intrusion detection, etc.
Both kind of Runners are useful, frequently used, but the not part of the
architecture. And as we try to make syslog "better", we better add them and make sure
out standards can "deal" with them. Otherwise, non-RFC compliance log-programs will be
standard.
H4 Syslog format
================
412 enterpriseID
-----------------
I don't see any reason to include an enterpriseID; not into the header. (When needed,
it can be used in the structured MSG part)
Currently, it is just a number. It will be unused, misused or will lead to a lot of
(operational) management. I'm afraid for the latter, as in H4.1.3 is suggested that
the semantic of the Facility can be enterpriceID depended.
Also, it is required to use the "IANA assigned vendor" number. This implies
open-source/free/non-commercial are 'ruled out' as the often will not to so.
Last, should the number of the Generator-vendor, the system-vendor of the
device-vendor be used? (See above about generator/device) This is not clear to me. And
whatever one is chosen, it will be hard to implement. Not using the defacto (Unix)
logging api's!
413 Facility
-------------
Although, at first sight, liked the idea of "a terrible lot" of facilities. The
current <used a number idea> is wrong, as I see it. Aside from the problems mentioned
above, more then a million facilities will mean relays can't be managed! The set of
facilities, which will be seen in a (major) network, expressed as numbers, will be are
more or less at random. Which implies very long complex and unmanageable configfiles
(or MIB's) for each LOG-router!
As we heave learn form routing IPv4, hierarchical structuring is needed.
I think, extending the set of facilities is good. But I can't imagine more the say
1000 are ever needed.
So, my counter-proposition is:
* Make facilities (as a number) structured
* Limit the number of facilities to a manageable number
* Keep the format such that extending the allowed numbers is possible
A Facility then is still a number, at least 3 (or 4) digits long. A longer number
means the it is an extended facility. They have to be assign by IANA.
Facilities of length 3 (4) MUST have the format '(K)KLM', where 'K' (or 'KK')
indicated the kind of facility; 'L' give a sub indication and 'M' is *SITE*
configurable (so, by the local sysadmin, see example below).
The 'K/KK' is based on the RFC3164 facilities, clean up and extended. Those numbers
can be IANA assigned. L can be chosen by the (generator) vendor, and `M' by the admin.
'M' defaults to 0 (zero), and applications/vendors MAY give the possibility to set
that digit.
Example: For mail, there will be an K specified, let say 1.
Then all mail-log will have the format '1XY', which is easily routable. It will do for
small sites.
Some vendors, like "sendmail" (only 1 process) will probably use only one value for L
e.g. '0'; others, like "postfix" (several processes) can use multiple values, like
'1', '2' and '3'.
When supported by both sendmail and postfix, the local sysadmin can add (change) the
M-digit, such that mail-systems on the border, and internal ones use another facility.
In all cases, the local sysadmin can either use simple routing-rules, like 1** (for
all mail), or 10* for sendmail and 1[1-2]*, 13* for postfix, or even more complex.
Now, the sysadmin has a choice, and can keep it manageable.
Note: the 0 for sendmail and 1-4 for postfix are "by example".
However, we can add a "rule" that '0' shall be used when only 1 L-value is used, and
when several values are used, zero should be skipped.
Also, I would like to prescribe/reserve "9" for local additions. (on all digits).
The (K)KLM idea is used a suggestion to improve, IF this idea is accepted, THEN we can
discuss variations like 3 or 4 digits, where to save mappings (IANA of this rfc), etc.
5 Structured data
==================
In short: I thing having the option of structured data is a nice option. But lets keep
it simple.
The current one is to complex, it has to be as we need 4 pages to describe it.
Also, I find those pages hard to study (given the time I had :-). Also it can lead to
not implementing it. Programmers, especially there bosses, don't have a lot of time!
More positive: The main reason why structured data is complex (currently) comes down
into 1 problems:
1) It isn't part of the main-design (the ABNF on page 9)
2) The "structure" can start anywhere in MSG
Both are easy to solve:
Ad 2) Specify that the structured part ALWAYS START directly after the header.
Ad 1) We need to introduce it where it belongs. In an optional field on page 9
Let give it a try ( I also use the "improvement" on the ABNF of my other mail; it
saves typing) (Also I "forget" the SP parts, for now. Just the idea)
SYSLOG-MSG = HEADER DATA
HEADER = VERSIONING PRIO ID // See other mail
DATA = [ *STR-DATA ] MSG
STR-DATA = see below
MSG = free format
Given this ABNF, the structured data ALWAYS comes (in RFC3164 notation) at the start
of the MSG-part (of in new ABNF: before the free format MSG).
This implies receivers always have (as last resort) the option to see everything after
the header as free-format. And just store/forward it.
It implies the start of STR-DATA is simple to find: Its starts directly after the
header, or directly after another STR-DATA
This implies to, we have the option to start STR-DATA with '<' which is more usable
and XML-alike. The complex long "[EMAIL PROTECTED]" cookie isn't needed anymore.
However, we free to use it. My personal vote is for the (XML) < > style.
See also my other posting about details of structured-data.
6 Multi-Part Messages
=====================
There are some mistakes in the this part, but I like the general idea. However, I fell
spliting/reassembly is done in other protocols too. Maybe we can use/reference a (de
facto) standard? I don't know an RFC which we can use, but I'm sure there must be one!
Second, it to complex, and to long (to read). I have studied it, but I'm not sure I do
understand it. Some details about which I find hard/wrong/dislike
MP-timestamp
------------
I do not like having several messages having the same TIMESTAMP
62 SD-ID receiving an optional STR-DATA
=======================================
This must be a mistake!
In the 3rd paragraph is stated the a receiver sometimes MUST NOT parse a STR-DATA of a
log-message that is received. However, when the option Multi-part is not implemented,
is doesn't now this!
Hello WG,
This is part 2 (of 3) of my comment on the 3th draft-syslog-protocol. See the
introduction at my posting [EMAIL PROTECTED]
This one contains comment on the text of the document; to help to clarify, and shorten
the document. It does NOT contain comment on the "idea" (design); se posting [EMAIL
PROTECTED]
-------------------------------------------------------
Implementation hints
====================
The current RFC contains a lot of valuable hints for programmers, like the one about
time-secfrac (Yes I introduced it, at least the bug:-)!
Currently the are scattered around the document, making the document long and more
complex to read for non-programmers.
I would suggest to move all of them to a new chapter, at the end of the document
(after the current chapter 9).
ABNF (Chapter 4)
================
I think we can clarify the syntax, by "unflattening" RGC-3164 syslog uses some field
and subfields which are nice when introducing syslog to others. The "understand" names
as priority.
So, keep it structured. I give it a try (please correct the syntax of the ABNF, as I
not writing it daily anymore)
SYSLOG-MSG = HEADER DATA
HEADER = VERSIONING PRIO ID
DATA = [ STR-DATA ] MSG
VERSIONING = "V" VERSION
VERSION = 1*3 DIGIT
PRIO = '<' FACILITY '.' SEVERITY '>' // See [EMAIL PROTECTED] for notation
change
ID = TIMESTAMP SP HOSTNAME SP TAG
STR-DATA = see elsewhere
MSG = free format
(etc)
Now we have meaningful field, that can be used to. E.g the ID field (see other mail)
make each message unique
5.1 Format (typo?)
==================
On page 20, the paragraph starting with "The structured data element MUST ..." is
confusing. The 2nd last line say _no space_ is allowed, the last one says one or more
space are.
Is this a typo? Or it is unclear (at least to me)
Structures data, ID-length
==========================
I don't see why we should limit the several field to 64 positions. I agree this will
normally suffice. But so will 32, or 16, of any other number. 64 is "to big" to use as
a fix-sized ("reserved") space for programmer's, database fields, etc. (to big ==
spoil to much bytes on huge logs). So dynamic field need to be used anyhow. Then there
is no need for a trivial maximum.
Note, there is a maximum anyhow, given by the size of a single log-message. That will
do for "short term allocation".
Removing this limit, make the rdc cleaner and smaller.
Structured data, spaces
=======================
I would like to have all line about SP (spaces) in chapter 5 removed. The point about
0,1 or more spaces is not relevant.
In general, syslog uses SP to separate field (when needed). And allows them in
MSG-part. The syntax and semantics of STR-DATA is does not depend on the amount of
space. Nor is it harder to read. Implementing receivers even become easier when spaces
(in STR-DATA) can be skipped (while { if 'sp' then skip } ) instead of checking the
correct number, and doing something if wrong !
Proposal: Allow spaces anywhere, but inside SD-ID (see note) and SD-PARAM. SP in
SD-VALUE is allowed (already), but not a separator. Prescribe (at least 1) SP between
each param-value pair.
*Note: as SP between '[#@'(or '<') and the SP-ID itself is probably not a good idea,
but not a problem. We can fix is, by moving the fixed string in the ABNF:
STR-DATA = STR-START ... STR-END
STR-START = "[#@" SD-ID ; or '<'
STR-END = "]" ; or '>'
... = as before, SP are allowed.
Note2: Doing so allows for format line for human reading, which is handy
Example <x-gam-example doYoe="like this" or = "This one?" >
<z-gam-more Yes = "I do" find = "it readable!" >
This example is simple to parse, both for humans and computers. This change will make
chapter-5 shorter, I think!
Last, I think any whitespace should be allowed instead of SP (eg. TAB)
Chapter-5, MSG
==============
I suggest, but only as a detail, the text of chapter-5 should be part of 4.2
Hello WG,
This is part 3 (of 3) of my comment on the 3th draft-syslog-protocol. It only contains
short remark on details, and bit that didn't fit in part 1 (idea/design) or part 2
(the document itself). See posting [EMAIL PROTECTED] for more introduction
-------------------------------------------------------
413/314 FACILITY/SEVERITY
==========================
In this draft, both facility and severity are numbers. Even with my suggestion, the
are 'just numbers'. And numbers are hard to read for humans. Especially when the are a
lot of them. Most people will forget which column contains which number.
Therefore, I think it is better to use the (verbose)notation used by most syslog
implementations: "'<' facility '.' severity '>'"
Both facility and severity are numbers (at least in the wire). Collectors (viewers)
can translate those numbers into there names. But still use this format. And Even
without them, it is easier to read the first (new) then the second (current) line:
V1 0 <888.4> 2003-010-11T22:14:13.003Z new.Formated. ...
V1 0 888 4 2003-010-11T22:14:13.003Z old.Formated. ...
Note: I agree, we should not the tricky "8 times F plus S" notation. I'm not
suggesting that! Just insert a dot and the angles.
4151 timestamp, without time
============================
There are 'devices' as meant in this section which haven't an idea of TIME. So it is
a good idea this section.
Often, those devices can store ("know") a few bit of information. Therefore, I would
like to change this fixed TIMESTAMP, to the same one, but with a sequence number
attached; the factional-seconds field can be used for it.
Then a timestamp becomes 2001-01-01T00:00:60.<seq>Z.
As in the current draft, this time doesn't exist. But al least, collectors can (more
or less) sort a set of logmessages form 1 device
Note: the latter is needed for e.g. syslog-sign
417 TAG
=======
I think we need to make the TAG stuctured! All current syslog receivers (collectors,
relays) use __PARTS of__ the RFC3164 TAG to route messages.
In RFC3164 the TAG is simple and short, so it is quite simple to use it for routing.
Note: not the complete TAG is used, only the 'program name', never the PID part.
With the new TAG, with an static ID an a dynamic part, similar routing should be
possible. At least, the RFC should be clear on it. So, by demanding the static part is
"fixed", and make sure that static part can be found.
Given the current practice, routing is based (mainly) on the program-name, it would be
wise to (at least suggest) how (where) that part can be found.
Proposal: Forget native support for VMS/Windows/DOS and even Unix pathnames. And
introduce an URI (URL)-alike schema. Where only '/' (not the one form Unix, but from
URL's) is used to "path separation" and ':' and '//' are major separators.
In this case, the ABNF is simple; only the semantics become a little more complex.
Also, it become simple for web-applications to log. The have an URL already. All "old
fashioned:-)" application have an URL: ''file://path/to/appl'' already. This is valid
on any system.
Note: the dynamic part has to be added
Note2: for web-applictions, which include a 'hostpart' in the URL: that hostname is
NOT the same as the HOSTPART in the header. Frequently (a web-farm) several systems
share the same URL, but not the hostname. Then the sysadmin can decide which one to
use for routing.
Message ID
==========
Given the architecture of syslog-networks, messages can be duplicated. But, sometimes
messages are related (e.g. with the signatures of syslog-sign). Both the RFC3164 and
the current -protocol draft do not have possibilities to unique identify a message.
In practice, messages are unique by there hostname, TAG and timestamp. But, we can't
trust on this, as it isn't required.
I would like to introduce this requirement. It is simple to add to the RFC, and simple
to implement. Given the HOSTNAME and TAG, all the implementers have to do is never
send a message with the same timestamp. Given the microsecond resolution, this doable.
(It does imply some systems have to fake the last digits; I don't see a problem with
this. Otherwise we can add a "." SequenceNo to the timestamp
Structured data, tokens
=======================
Given the current draft, of my counter proposal of it, of structured data, the are
(only) 2 kinds of tokens. The IANA controlled ones and the experimental ones; the
latter starting with "x-"
I think, we can safely add an third: "X-" for private/local/vendor specific tokens. As
we can see my e.g. mail, this kind of field will be used a lot. Now we have an option
to allow the, without giving the a status of "testing".
STR-DATA, can we use it for syslog-sign (or similar)?
=====================================================
Just an idea: in a protocol as syslog-sign (here just as example), where messages
reference to other messages (now implicit), we could use the STR-DATA to do so?
I verified the syntax/semantics of this, and YES, we could do so.
This means, I like STR-DATA a lot more :-) It is great. Even when -sign doesn't use
it, I (we) can use the same format to present it to the user!
64 MultiPart examples
=====================
All examples use rfc3164 headers, shouldn't -protocol headers be used?