The encoding works like this: Each value consists of a type code followed
by zero or more bytes. For integers, there are type codes for a range of
values, say -10 to 40, and codes for integers of length 1 to 8. For
strings, there are type codes for strings from, say, 0 to 40 bytes,
followed immediately by the respective strings, and for strings with binary
counts from 1 to 4 bytes that are first followed by the count and the
respective string. There are similar sets of codes for decimal scaled
integers, doubles, dates, etc.
So small integers are represented by a single byte. Short strings are
presented by a byte plus the string. The exact ranges for small integers
and lengths of small strings are more or less arbitrary.
I have also restricted strings to UTF-8 (which is a different argument),
but the encoding doesn't attach semantics to strings, so this isn't
strictly necessary.
No bytes are wasted in padding, high order binary zeros, or run lengths.
In memory, I have generally used a vector of 16 bit offsets to hold the
offsets of known fields and a "high water" mark, which minimizes parsing to
near zero. Note that a simple static lookup table will give the lengths of
more types. Counted values are represented by in the table by negative
count lengths.
I used the scheme in Netfrastructure/Falcon and again in NuoDB. For
AmorphousDB I reimplemented a similar scheme with slightly different tuning.
Note that given the address and length of an encoded record, it is trivial
to validate a record and to print out formatted values for debugging.
On Monday, February 23, 2015, Slavomir Skopalik <skopa...@elektlabs.cz>
wrote:
> Hi Jim,
> can you explain more about your algorithm for "self-describing value
> encoding" ?
> I'm interesting in this.
>
> Thank you Slavek
>
> Ing. Slavomir Skopalik
> Executive Head
> Elekt Labs s.r.o.
> Collection and evaluation of data from machines and laboratories
> by means of system MASA (http://www.elektlabs.cz/m2demo)
> -----------------------------------------------------------------
> Address:
> Elekt Labs s.r.o.
> Chaloupky 158
> 783 72 Velky Tynec
> Czech Republic
> ---------------------------------------------------------------
> Mobile: +420 724 207 851
> icq:199 118 333skype:skopalikse-mail:skopa...@elektlabs.cz
> <javascript:_e(%7B%7D,'cvml','e-mail:skopa...@elektlabs.cz');>http://www.elektlabs.cz
>
> On 23.2.2015 14:13, James Starkey wrote:
>
> I'm been using a self-describing value encoding for a decade and a half.
> It's denser and cheaper to compress and decompress than the existing run
> length encoding, though I'm not sure that compressing version delta would
> be a lot of fun, but probably some clever fellow can think of a good
> algorithm.
>
> On Monday, February 23, 2015, Slavomir Skopalik <skopa...@elektlabs.cz>
> <javascript:_e(%7B%7D,'cvml','skopa...@elektlabs.cz');>
> wrote:
>
>
> Hi,
> for FB3 I will recomend more effective algoritm than hacking this current
> one.
> If you are interested, I can specify.
>
> I was made another test with release build windows 64 bit and results:
>
> DB size decrese from 90GB -> 60 GB.
> Some select count(*) from table like this one:
>
> Create Table ProductDataEx (
> idProduct TLongInt NOT NULL,
> idMeasurand Smallint NOT NULL,
> idMeasurementMode TSmallInt NOT NULL,
> ValIndex Smallint Default 0 NOT NULL,
> idPeople TSmallInt NOT NULL,
> tDate TimeDateFutureCheck NOT NULL,
> Value1 Double precision NOT NULL,
> Description TMemo,
> Constraint pk_ProductDataEx Primary Key (idProduct,idMeasurand,
> idMeasurementMode,ValIndex)
> );
>
> Decrease from ~150s(any run) -> 52s for first run and 36s another run.
>
> This modifycation can read old DB, but after write, previous server will
> failed.
> So, if I can I will vote to 2.5.4 (FB3 is so far).
>
> Also I was made some speed optimization, this version is faster, then
> previous one.
>
> If somebody else is interesting in this, I can put my private buid for
> Win64 on my web site.
>
> Best regards Slavek
>
> Ing. Slavomir Skopalik
> Executive Head
> Elekt Labs s.r.o.
> Collection and evaluation of data from machines and laboratories
> by means of system MASA (http://www.elektlabs.cz/m2demo)
> -----------------------------------------------------------------
> Address:
> Elekt Labs s.r.o.
> Chaloupky 158
> 783 72 Velky Tynec
> Czech Republic
> ---------------------------------------------------------------
> Mobile: +420 724 207 851
> icq:199 118 333skype:skopalikse-mail:skopa...@elektlabs.cz
> <javascript:_e(%7B%7D,'cvml','e-mail:skopa...@elektlabs.cz');>http://www.elektlabs.cz
>
> On 23.2.2015 7:38, Dmitry Yemanov wrote:
>
>
> I didn't look at the code closely, but the idea is more or less the same
> as I was considering for CORE-4401. I just wanted to use the control
> char of zero for that purpose, as it's practically useless for either
> compressible or non-compressible runs.
>
> The new encoding affects the ODS, so it cannot be used in the v2.5
> series (it may be possible with ODS 11.3 but I don't think we need a
> minor ODS change in v2.5). But it surely could be applied to v3 after
> review and we don't have to worry about backward compatibility in ODS 12.
>
>
> Dmitry
>
>
> ------------------------------------------------------------
> ------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations,
> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=190641631&
> iu=/4140/ostg.clktrk
> Firebird-Devel mailing list, web interface
> athttps://lists.sourceforge.net/lists/listinfo/firebird-devel
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations,
> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>
>
>
> Firebird-Devel mailing list, web interface at
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
>
>
>
--
Jim Starkey
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel