Hi Jim,
thank you very much.

If I understand correctly,
you recomend to have record structure like this:
1. format version
2. nulls flags (each byte will code 8 fields, only nullable fields will have flag)
3. tr_id (encoded same way as other integer)
4. Only not null field encoded.
For examples, it doesn't matter how is integer declared (16-64bit) it is encoded by real value. Also if float point number contain integer (it is offen) it can be stored as integer.
   With other data types it is similar.

For ranges and offsets you was performed some statistical analysis like
is described in Shannon theory of information.

Very good idea, thanks for this!

In firebird, I have not worry about record difference, because it is store really inefective
(no change diff for 32000 bytes record contain 250bytes).

About my RLE,
I had prepared algorithm that encode up 66 zeroes into one byte,
and in worst case add max 3 bytes for 64kb record.

Slavek

Ing. Slavomir Skopalik
Executive Head
Elekt Labs s.r.o.
Collection and evaluation of data from machines and laboratories
by means of system MASA (http://www.elektlabs.cz/m2demo)
-----------------------------------------------------------------
Address:
Elekt Labs s.r.o.
Chaloupky 158
783 72 Velky Tynec
Czech Republic
---------------------------------------------------------------
Mobile: +420 724 207 851
icq:199 118 333
skype:skopaliks
e-mail:skopa...@elektlabs.cz
http://www.elektlabs.cz

On 23.2.2015 16:31, James Starkey wrote:
The encoding works like this:  Each value consists of a type code followed
by zero or more bytes.  For integers, there are type codes for a range of
values, say -10 to 40, and codes for integers of length 1 to 8.  For
strings, there are type codes for strings from, say, 0 to 40 bytes,
followed immediately by the respective strings, and for strings with binary
counts from 1 to 4 bytes that are first followed by the count and the
respective string.  There are similar sets of codes for decimal scaled
integers, doubles, dates, etc.

So small integers are represented by a single byte.  Short strings are
presented by a byte plus the string.  The exact ranges for small integers
and lengths of small strings are more or less arbitrary.

I have also restricted strings to UTF-8 (which is a different argument),
but the encoding doesn't attach semantics to strings, so this isn't
strictly necessary.

No bytes are wasted in padding, high order binary zeros, or run lengths.

In memory, I have generally used a vector of 16 bit offsets to hold the
offsets of known fields and a "high water" mark, which minimizes parsing to
near zero.  Note that a simple static lookup table will give the lengths of
more types.  Counted values are represented by in the table by negative
count lengths.

I used the scheme in Netfrastructure/Falcon and again in NuoDB.  For
AmorphousDB I reimplemented a similar scheme with slightly different tuning.

Note that given the address and length of an encoded record, it is trivial
to validate a record and to print out formatted values for debugging.

On Monday, February 23, 2015, Slavomir Skopalik<skopa...@elektlabs.cz>
wrote:

  Hi Jim,
can you explain more about your algorithm for  "self-describing value
encoding" ?
I'm interesting in this.

Thank you Slavek

Ing. Slavomir Skopalik
Executive Head
Elekt Labs s.r.o.
Collection and evaluation of data from machines and laboratories
by means of system MASA (http://www.elektlabs.cz/m2demo)
-----------------------------------------------------------------
Address:
Elekt Labs s.r.o.
Chaloupky 158
783 72 Velky Tynec
Czech Republic
---------------------------------------------------------------
Mobile: +420 724 207 851
icq:199 118333skype:skopalikse-mail:skopa...@elektlabs.cz  
<javascript:_e(%7B%7D,'cvml','e-mail:skopa...@elektlabs.cz');>http://www.elektlabs.cz

On 23.2.2015 14:13, James Starkey wrote:

I'm been using a self-describing value encoding for a decade and a half.
It's denser and cheaper to compress and decompress than the existing run
length encoding, though I'm not sure that compressing version delta would
be a lot of fun, but probably some clever fellow can think of a good
algorithm.

On Monday, February 23, 2015, Slavomir Skopalik<skopa...@elektlabs.cz>  
<javascript:_e(%7B%7D,'cvml','skopa...@elektlabs.cz');>
wrote:


  Hi,
for FB3 I will recomend more effective algoritm than hacking this current
one.
If you are interested, I can specify.

I was made another test with release build windows 64 bit and results:

DB size decrese from 90GB -> 60 GB.
Some select count(*) from table like this one:

Create Table ProductDataEx  (
     idProduct TLongInt NOT NULL,
     idMeasurand Smallint NOT NULL,
     idMeasurementMode TSmallInt NOT NULL,
     ValIndex Smallint Default 0 NOT NULL,
     idPeople TSmallInt NOT NULL,
     tDate TimeDateFutureCheck NOT NULL,
     Value1 Double precision NOT NULL,
     Description TMemo,
Constraint pk_ProductDataEx Primary Key (idProduct,idMeasurand,
idMeasurementMode,ValIndex)
);

Decrease from ~150s(any run) -> 52s for first run and 36s another run.

This modifycation can read old DB, but after write, previous server will
failed.
So, if I can I will vote to 2.5.4 (FB3 is so far).

Also I was made some speed optimization, this version is faster, then
previous one.

If somebody else is interesting in this, I can put my private buid for
Win64 on my web site.

Best regards Slavek

Ing. Slavomir Skopalik
Executive Head
Elekt Labs s.r.o.
Collection and evaluation of data from machines and laboratories
by means of system MASA (http://www.elektlabs.cz/m2demo)
-----------------------------------------------------------------
Address:
Elekt Labs s.r.o.
Chaloupky 158
783 72 Velky Tynec
Czech Republic
---------------------------------------------------------------
Mobile: +420 724 207 851
icq:199 118333skype:skopalikse-mail:skopa...@elektlabs.cz  
<javascript:_e(%7B%7D,'cvml','e-mail:skopa...@elektlabs.cz');>http://www.elektlabs.cz

On 23.2.2015 7:38, Dmitry Yemanov wrote:


  I didn't look at the code closely, but the idea is more or less the same
as I was considering for CORE-4401. I just wanted to use the control
char of zero for that purpose, as it's practically useless for either
compressible or non-compressible runs.

The new encoding affects the ODS, so it cannot be used in the v2.5
series (it may be possible with ODS 11.3 but I don't think we need a
minor ODS change in v2.5). But it surely could be applied to v3 after
review and we don't have to worry about backward compatibility in ODS 12.


Dmitry


------------------------------------------------------------
------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, 
FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=190641631&;
iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface 
athttps://lists.sourceforge.net/lists/listinfo/firebird-devel



------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, 
FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk



Firebird-Devel mailing list, web interface 
athttps://lists.sourceforge.net/lists/listinfo/firebird-devel





------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk


Firebird-Devel mailing list, web interface 
athttps://lists.sourceforge.net/lists/listinfo/firebird-devel

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to