[fonc] OT? S-Exps and network (Re: Error trying to compile COLA)

BGB Sun, 26 Feb 2012 12:57:10 -0800

On 2/26/2012 11:33 AM, Martin Baldan wrote:

Guys, I find these off_topic comments (as in not strictly about myidst compilation problem) really interesting. Maybe I should start anew thread? Something like «how can a newbie start playing with thistechnology?». Thanks!


well, ok, hopefully everyone can tolerate my OT-ness here...
(and hopefully, my forays deep into the land of trivia...).

well, ok, I am not personally associated with VPRI though, mostly justlurking and seeing if any interesting topics come up (but, otherwise, amworking independently on my own technology, which includes some VM stuffand a 3D engine).

( currently, no code is available online, but parts can be given onrequest via email or similar if anyone is interested, likewise goes forspecs, ... )

recently, I had worked some on adding networking support for my 3Dengine, but the protocol is more generic (little about it isparticularly specific to 3D gaming, and so could probably have other uses).

internally, the messaging is based on lists / S-Expressions (it isn'treally clear which term is better, as "lists" is too generic, andS-Expressions more refers to the syntax, rather than their in-programrepresentation... actually it is a similar terminology problem with XML,where the term may ambiguously either be used for the textualrepresentation, or for alternative non-text representations of thepayload, IOW: "Binary XML", and similar).

but, either way, messages are passed point-to-point as lists, typicallyusing a structure sort of like:

(wdelta ... (delta 315 (org 140 63 400) (ang 0 0 120) ...) ...)

the messages are free-form (there is no "schema", as the system will tryto handle whatever messages are thrown at it, but with the typicaldefault behavior for handlers of ignoring anything which isn'trecognized, and the protocol/codec is agnostic to the types or format ofthe messages it is passing along, provided as long as they are builtfrom lists or similar...).

as-is, these expressions are not "eval'ed" per-se, although the typicalmessage handling could be itself regarded as a crude evaluator (earlyversions of my original Scheme interpreter were not actually all thatmuch different). theoretically, things like ASTs or Scheme code orwhatever could be easily passed over the connection as well.

in-program, the lists are dynamically typed, and composed primarily ofchains of "cons cells", with "symbols", "fixnums", "flonums", "strings",... comprising most of the structure (these are widely used in myprojects, but aren't particularly unique to my project, though seeminglyless well-known to most more mainstream programmers).

as-is, currently a small subset of the larger typesystem is handled, andI am currently ignoring the matter of list cycles or object-identity(data is assumed acyclic, and currently everything is passed as a copy).



at the high-level, the process currently mostly looks like:

process A listens on a port, and accepts connections, and then handlesany messages which arrive over these connections, and may transmitmessages in response.

process B connects to A, and may likewise send and receive messages.

currently, each end basically takes whatever messages are received, andpasses them off to message-processing code (walks the messageexpressions and does whatever). currently, queues are commonly used forboth incoming and outgoing messages, and most messages are asynchronous.

neither end currently needs to worry about the "over-the-wire" format ofthese lists.a system resembling XMPP could probably also be built easily enough (andmay end up being done anyways).

lists were chosen over XML mostly for sake of them being more convenientto work with.

actually, I did something similar to all this long ago, but this effortfell into oblivion and similar had not been done again until fairlyrecently (partly involving me reviving some old forgotten code of mine...).




now, on to the protocol itself:
it is currently built over raw TCP sockets (currently with "nodelay" set);

messages are encoded into "lumps", which are basically tags followed bymessage data (lumps are also used for stream-control purposes, and mayrelay other types of messages as well).

currently, a system of tags resembling the one in JPEG is used, exceptthat the tags are 4 bytes (with 3 bytes of "magic" and 1 byte toindicate the tag type, a longer magic was used to reduce the number oftimes it would need to be escaped in a bitstream). currently, no lengthis used (instead, one knows a complete message lump has been receivedbecause the end-tag is visible). this currently means an 8 byte overheadper-message lump due to tags (there are also Deflate lumps, but thesehave the added overhead of a decoded-length and a checksum, needed fortechnical reasons, leading to 16 bytes of overhead).

message lumps are themselves a bitstream, and are currently built out ofa collection of "minilumps", currently each indicated via a 4 bit tag(there are no lengths or end markers here). minilumps currently dothings like indicate the Huffman tables, and also give theindividually-coded messages (there may be multiple physical messages pera given message-lump).

the Huffman tables resemble the format used in Deflate, only using Ricecodes to encode the table of symbol lengths (seems to work well enough,Deflate used Huffman coding on the Huffman table), and with a few minordifferences in the RLE scheme.

values are encoded using a mix of "command tags" and an MRU+MTF scheme(recently coded values may be reused from a table). strings (strings,symbols, keywords, ...) and data-members (byte arrays, ...) use an LZ77variant (itself very similar to how data is represented in Deflate).

note that, for data members, the MRU serves a similar role to that ofthe "sliding window" in LZ77 (I may consider dropping MTF due to variousreasons though, and maybe add a data-member analogue of an LZ-run).

currently, 3 Huffman tables are used, one for command-tags, another forliteral bytes (in strings and data members), and the 3rd for distancesand integers (fixnums, flonums, ...). most integer-like values are codedusing a similar prefix+extra-bits scheme to that used in Deflate.floating-point values are encoded as a pair of integers(mantissa+exponent, although the definition is "m*2^e", with themantissa as an integer rather than a normalized value, so "1,8" willencode "256.0").

all of this leads to lower encoded messages sizes than what I wasgetting by serializing the lists into textual S-Expression form, andthen applying Deflate to the result (and is semantically equivalent).

most of the added compression is most likely due to the ability of thescheme to make use of additional knowledge about the data beingcompressed, since data members can be encoded directly, rather than thecompressor having to deal with them as strings of ASCII charactersencoding the data members.


not that Deflate was doing all that poorly either though.

theoretically also possible though (and technically simpler) is toflatten the S-Expressions into a byte-based binary format and then feedthis through Deflate (with likely intermediate results).

the main reason for trying to compress the data is mostly so that it hasa much lower chance of bogging down the user's internet connection orsimilar (increasing the risk for network stalls and poor performance).


granted, yes, this is probably overkill.

but, it works...

time to implement support for this (and get networking for my 3D engineto work, in general) was a little over 1 week.



or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

[fonc] OT? S-Exps and network (Re: Error trying to compile COLA)

Reply via email to