On 2/26/2012 11:33 AM, Martin Baldan wrote:

Guys, I find these off_topic comments (as in not strictly about my idst compilation problem) really interesting. Maybe I should start a new thread? Something like «how can a newbie start playing with this technology?». Thanks!


well, ok, hopefully everyone can tolerate my OT-ness here...
(and hopefully, my forays deep into the land of trivia...).


well, ok, I am not personally associated with VPRI though, mostly just lurking and seeing if any interesting topics come up (but, otherwise, am working independently on my own technology, which includes some VM stuff and a 3D engine).

( currently, no code is available online, but parts can be given on request via email or similar if anyone is interested, likewise goes for specs, ... )


recently, I had worked some on adding networking support for my 3D engine, but the protocol is more generic (little about it is particularly specific to 3D gaming, and so could probably have other uses).

internally, the messaging is based on lists / S-Expressions (it isn't really clear which term is better, as "lists" is too generic, and S-Expressions more refers to the syntax, rather than their in-program representation... actually it is a similar terminology problem with XML, where the term may ambiguously either be used for the textual representation, or for alternative non-text representations of the payload, IOW: "Binary XML", and similar).

but, either way, messages are passed point-to-point as lists, typically using a structure sort of like:
(wdelta ... (delta 315 (org 140 63 400) (ang 0 0 120) ...) ...)

the messages are free-form (there is no "schema", as the system will try to handle whatever messages are thrown at it, but with the typical default behavior for handlers of ignoring anything which isn't recognized, and the protocol/codec is agnostic to the types or format of the messages it is passing along, provided as long as they are built from lists or similar...).

as-is, these expressions are not "eval'ed" per-se, although the typical message handling could be itself regarded as a crude evaluator (early versions of my original Scheme interpreter were not actually all that much different). theoretically, things like ASTs or Scheme code or whatever could be easily passed over the connection as well.

in-program, the lists are dynamically typed, and composed primarily of chains of "cons cells", with "symbols", "fixnums", "flonums", "strings", ... comprising most of the structure (these are widely used in my projects, but aren't particularly unique to my project, though seemingly less well-known to most more mainstream programmers).

as-is, currently a small subset of the larger typesystem is handled, and I am currently ignoring the matter of list cycles or object-identity (data is assumed acyclic, and currently everything is passed as a copy).


at the high-level, the process currently mostly looks like:
process A listens on a port, and accepts connections, and then handles any messages which arrive over these connections, and may transmit messages in response.
process B connects to A, and may likewise send and receive messages.

currently, each end basically takes whatever messages are received, and passes them off to message-processing code (walks the message expressions and does whatever). currently, queues are commonly used for both incoming and outgoing messages, and most messages are asynchronous.

neither end currently needs to worry about the "over-the-wire" format of these lists. a system resembling XMPP could probably also be built easily enough (and may end up being done anyways).

lists were chosen over XML mostly for sake of them being more convenient to work with.


actually, I did something similar to all this long ago, but this effort fell into oblivion and similar had not been done again until fairly recently (partly involving me reviving some old forgotten code of mine...).



now, on to the protocol itself:
it is currently built over raw TCP sockets (currently with "nodelay" set);
messages are encoded into "lumps", which are basically tags followed by message data (lumps are also used for stream-control purposes, and may relay other types of messages as well).

currently, a system of tags resembling the one in JPEG is used, except that the tags are 4 bytes (with 3 bytes of "magic" and 1 byte to indicate the tag type, a longer magic was used to reduce the number of times it would need to be escaped in a bitstream). currently, no length is used (instead, one knows a complete message lump has been received because the end-tag is visible). this currently means an 8 byte overhead per-message lump due to tags (there are also Deflate lumps, but these have the added overhead of a decoded-length and a checksum, needed for technical reasons, leading to 16 bytes of overhead).

message lumps are themselves a bitstream, and are currently built out of a collection of "minilumps", currently each indicated via a 4 bit tag (there are no lengths or end markers here). minilumps currently do things like indicate the Huffman tables, and also give the individually-coded messages (there may be multiple physical messages per a given message-lump).

the Huffman tables resemble the format used in Deflate, only using Rice codes to encode the table of symbol lengths (seems to work well enough, Deflate used Huffman coding on the Huffman table), and with a few minor differences in the RLE scheme.

values are encoded using a mix of "command tags" and an MRU+MTF scheme (recently coded values may be reused from a table). strings (strings, symbols, keywords, ...) and data-members (byte arrays, ...) use an LZ77 variant (itself very similar to how data is represented in Deflate).

note that, for data members, the MRU serves a similar role to that of the "sliding window" in LZ77 (I may consider dropping MTF due to various reasons though, and maybe add a data-member analogue of an LZ-run).

currently, 3 Huffman tables are used, one for command-tags, another for literal bytes (in strings and data members), and the 3rd for distances and integers (fixnums, flonums, ...). most integer-like values are coded using a similar prefix+extra-bits scheme to that used in Deflate. floating-point values are encoded as a pair of integers (mantissa+exponent, although the definition is "m*2^e", with the mantissa as an integer rather than a normalized value, so "1,8" will encode "256.0").


all of this leads to lower encoded messages sizes than what I was getting by serializing the lists into textual S-Expression form, and then applying Deflate to the result (and is semantically equivalent).

most of the added compression is most likely due to the ability of the scheme to make use of additional knowledge about the data being compressed, since data members can be encoded directly, rather than the compressor having to deal with them as strings of ASCII characters encoding the data members.

not that Deflate was doing all that poorly either though.

theoretically also possible though (and technically simpler) is to flatten the S-Expressions into a byte-based binary format and then feed this through Deflate (with likely intermediate results).

the main reason for trying to compress the data is mostly so that it has a much lower chance of bogging down the user's internet connection or similar (increasing the risk for network stalls and poor performance).

granted, yes, this is probably overkill.

but, it works...

time to implement support for this (and get networking for my 3D engine to work, in general) was a little over 1 week.


or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to