On 2/26/2012 11:33 AM, Martin Baldan wrote:
Guys, I find these off_topic comments (as in not strictly about my
idst compilation problem) really interesting. Maybe I should start a
new thread? Something like «how can a newbie start playing with this
technology?». Thanks!
well, ok, hopefully everyone can tolerate my OT-ness here...
(and hopefully, my forays deep into the land of trivia...).
well, ok, I am not personally associated with VPRI though, mostly just
lurking and seeing if any interesting topics come up (but, otherwise, am
working independently on my own technology, which includes some VM stuff
and a 3D engine).
( currently, no code is available online, but parts can be given on
request via email or similar if anyone is interested, likewise goes for
specs, ... )
recently, I had worked some on adding networking support for my 3D
engine, but the protocol is more generic (little about it is
particularly specific to 3D gaming, and so could probably have other uses).
internally, the messaging is based on lists / S-Expressions (it isn't
really clear which term is better, as "lists" is too generic, and
S-Expressions more refers to the syntax, rather than their in-program
representation... actually it is a similar terminology problem with XML,
where the term may ambiguously either be used for the textual
representation, or for alternative non-text representations of the
payload, IOW: "Binary XML", and similar).
but, either way, messages are passed point-to-point as lists, typically
using a structure sort of like:
(wdelta ... (delta 315 (org 140 63 400) (ang 0 0 120) ...) ...)
the messages are free-form (there is no "schema", as the system will try
to handle whatever messages are thrown at it, but with the typical
default behavior for handlers of ignoring anything which isn't
recognized, and the protocol/codec is agnostic to the types or format of
the messages it is passing along, provided as long as they are built
from lists or similar...).
as-is, these expressions are not "eval'ed" per-se, although the typical
message handling could be itself regarded as a crude evaluator (early
versions of my original Scheme interpreter were not actually all that
much different). theoretically, things like ASTs or Scheme code or
whatever could be easily passed over the connection as well.
in-program, the lists are dynamically typed, and composed primarily of
chains of "cons cells", with "symbols", "fixnums", "flonums", "strings",
... comprising most of the structure (these are widely used in my
projects, but aren't particularly unique to my project, though seemingly
less well-known to most more mainstream programmers).
as-is, currently a small subset of the larger typesystem is handled, and
I am currently ignoring the matter of list cycles or object-identity
(data is assumed acyclic, and currently everything is passed as a copy).
at the high-level, the process currently mostly looks like:
process A listens on a port, and accepts connections, and then handles
any messages which arrive over these connections, and may transmit
messages in response.
process B connects to A, and may likewise send and receive messages.
currently, each end basically takes whatever messages are received, and
passes them off to message-processing code (walks the message
expressions and does whatever). currently, queues are commonly used for
both incoming and outgoing messages, and most messages are asynchronous.
neither end currently needs to worry about the "over-the-wire" format of
these lists.
a system resembling XMPP could probably also be built easily enough (and
may end up being done anyways).
lists were chosen over XML mostly for sake of them being more convenient
to work with.
actually, I did something similar to all this long ago, but this effort
fell into oblivion and similar had not been done again until fairly
recently (partly involving me reviving some old forgotten code of mine...).
now, on to the protocol itself:
it is currently built over raw TCP sockets (currently with "nodelay" set);
messages are encoded into "lumps", which are basically tags followed by
message data (lumps are also used for stream-control purposes, and may
relay other types of messages as well).
currently, a system of tags resembling the one in JPEG is used, except
that the tags are 4 bytes (with 3 bytes of "magic" and 1 byte to
indicate the tag type, a longer magic was used to reduce the number of
times it would need to be escaped in a bitstream). currently, no length
is used (instead, one knows a complete message lump has been received
because the end-tag is visible). this currently means an 8 byte overhead
per-message lump due to tags (there are also Deflate lumps, but these
have the added overhead of a decoded-length and a checksum, needed for
technical reasons, leading to 16 bytes of overhead).
message lumps are themselves a bitstream, and are currently built out of
a collection of "minilumps", currently each indicated via a 4 bit tag
(there are no lengths or end markers here). minilumps currently do
things like indicate the Huffman tables, and also give the
individually-coded messages (there may be multiple physical messages per
a given message-lump).
the Huffman tables resemble the format used in Deflate, only using Rice
codes to encode the table of symbol lengths (seems to work well enough,
Deflate used Huffman coding on the Huffman table), and with a few minor
differences in the RLE scheme.
values are encoded using a mix of "command tags" and an MRU+MTF scheme
(recently coded values may be reused from a table). strings (strings,
symbols, keywords, ...) and data-members (byte arrays, ...) use an LZ77
variant (itself very similar to how data is represented in Deflate).
note that, for data members, the MRU serves a similar role to that of
the "sliding window" in LZ77 (I may consider dropping MTF due to various
reasons though, and maybe add a data-member analogue of an LZ-run).
currently, 3 Huffman tables are used, one for command-tags, another for
literal bytes (in strings and data members), and the 3rd for distances
and integers (fixnums, flonums, ...). most integer-like values are coded
using a similar prefix+extra-bits scheme to that used in Deflate.
floating-point values are encoded as a pair of integers
(mantissa+exponent, although the definition is "m*2^e", with the
mantissa as an integer rather than a normalized value, so "1,8" will
encode "256.0").
all of this leads to lower encoded messages sizes than what I was
getting by serializing the lists into textual S-Expression form, and
then applying Deflate to the result (and is semantically equivalent).
most of the added compression is most likely due to the ability of the
scheme to make use of additional knowledge about the data being
compressed, since data members can be encoded directly, rather than the
compressor having to deal with them as strings of ASCII characters
encoding the data members.
not that Deflate was doing all that poorly either though.
theoretically also possible though (and technically simpler) is to
flatten the S-Expressions into a byte-based binary format and then feed
this through Deflate (with likely intermediate results).
the main reason for trying to compress the data is mostly so that it has
a much lower chance of bogging down the user's internet connection or
similar (increasing the risk for network stalls and poor performance).
granted, yes, this is probably overkill.
but, it works...
time to implement support for this (and get networking for my 3D engine
to work, in general) was a little over 1 week.
or such...
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc