Re: [Firebird-devel] Recore level compresion imroovement

Slavomir Skopalik Sun, 01 Mar 2015 09:58:37 -0800

Hi Jim,
my proposal was not so abstract as your.

I just want to put all parts of encoding/decoding  into one class with 
clear interface that will able
to put different ecoder in development time (FB3+).


I will contact Firebird developer to make agreement about changes is 
this class:

class Compressor : public Firebird::AutoStorage

If will be posible to have access in record format, it can be easy to create
self-described encoding.
I have in my mind a idea of this schema that I woudlike to test it.

Slavek

Ing. Slavomir Skopalik
Executive Head
Elekt Labs s.r.o.
Collection and evaluation of data from machines and laboratories
by means of system MASA (http://www.elektlabs.cz/m2demo)
-----------------------------------------------------------------
Address:
Elekt Labs s.r.o.
Chaloupky 158
783 72 Velky Tynec
Czech Republic
---------------------------------------------------------------
Mobile: +420 724 207 851
icq:199 118 333
skype:skopaliks
e-mail:[email protected]
http://www.elektlabs.cz

On 28.2.2015 22:43, Jim Starkey wrote:
> OK, I think I understand what you are trying to do -- and please correct me 
> if I'm wrong.  You want to standardize an interface between an encoding and 
> DPM, separating the actual encoding/decoding from the fragmentation process.  
> In other words, you want to compress a record in toto then let somebody else 
> to chop the resulting byte stream to and from data pages.  In essence, this 
> makes the compression scheme plug replaceable.
>
> If this is your intention, it isn't a bad idea, but it does have problems.  
> The first is how to map a given record to a particular decoding schema.  The 
> second, more difficult, is how to do this without bumping the ODS (desirable, 
> but not essential).  A third is how to handle encodings that are not 
> variations on run length encoding (such as value based encoding).
>
> If I'm on the right track, do note that the current decoding schema already 
> fits your bill.  Concatenate the fragments and decode.  The encoding process, 
> on the other hand, is more problematic.
>
> Encoding/decoding in place is more efficient than using a temp, but not so 
> much as to preclude it.  I might be wrong, but I doubt that the existing 
> schema shows up as a hot spot in a profile.  But that said, I'm far from 
> convinced that variations on a run length theme are going to have any 
> significant benefit for either density or performance.
>
> My post Interbase database systems don't access records on page (NuoDB 
> doesn't even have pages).  Records have one format in storage and others 
> formats in memory within a record class that understands the transitions 
> between formats (essentially doing the various encode and decoding).  There 
> are generally an encoded form (raw byte stream), a descriptor vector for 
> buiding new records, and some sort of ancillary structure for field 
> references to either.
>
> In my mind, I think it would be wiser to Firebird to go with a flexible 
> record object than to simply abstract the encoding/decoding process.  More 
> code would need to be changed, but when you were done, there would be much 
> less code.
>
> Architecturally, abstracting encoding/decode makes sense, but practically, I 
> don't it buys much.  A deep reorganzation, I believe, would have a much 
> better long term payoff.
>
> But then maybe I missed your point...
>
> Jim Starkey
>
>
>> On Feb 28, 2015, at 10:30 AM, Slavomir Skopalik <[email protected]> 
>> wrote:
>>
>> Hi Jim,
>> I don't want to change ODS for saving one byte per page.
>> I want to change sources to be able implement different
>> encoder (put name that you want) -> change ODS.
>>
>> For some encoder is frangmentation lost 1-2 byte, for another
>> can be more.
>> For some encoder is easy to do reverse parsing, for some other
>> is much more complicated.
>>
>> For some situation can be generation of control stream benefit,
>> but as is now in sources (FB2.5, FB3) that I read, it is not.
>>
>> Current compressor interface:
>> to create control stream:
>> ULONG SQZ_length(const SCHAR* data, ULONG length, DataComprControl* dcc)
>>
>> to create final stream from control stream:
>> void SQZ_fast(const DataComprControl* dcc, const SCHAR* input, SCHAR*
>> output)
>>
>> To calculate how many bytes can be commpressed into small area (from
>> control stream):
>> USHORT SQZ_compress_length(const DataComprControl* dcc, const SCHAR*
>> input, int space)
>>
>> To compress into small area:
>> USHORT SQZ_compress(const DataComprControl* dcc, const SCHAR* input,
>> SCHAR* output, int space)
>>
>> and decomress:
>> UCHAR* SQZ_decompress(const UCHAR*    input,  USHORT        length,
>> UCHAR*        output,   const UCHAR* const    output_end)
>>
>> And some routines is directly in storage code.
>>
>> In FB3 is very similar (changed names, organized into class, same hack
>> in store_big_record(problem is not code itself, but where the code is)).
>>
>> The question is:
>> Why keep control stream (worst CPU, litle worst HDD, and also important
>> for me - worst readable code)?
>> It seems to be, that was implemented this way because of RAM limitation.
>>
>> And another question:
>> What functions and parameters have been in new interface?
>>
>> If you have idea how to use control stream with benefits, please share it.
>>
>> Slavek
>>
>> BTW: If we drop control stream, posted code will reduce to one movecpy
>> that is implemented by SSE+ instructions.
>>
>>
>>> On 28.2.2015 14:16, James Starkey wrote:
>>> I regret both that I don't have a copy of Firebird source on the boat or
>>> access to adequate bandwidth to get it, so I'm not in a position to comment
>>> on tge existing code one way or another.  But as I understand your
>>> proposal, you are suggestion the the ODS be changed to save (at most) one
>>> byte per 4,050 bytes (approximately) of very large fragmented record.  That
>>> isn't much of a payback.
>>>
>>> But looking at your code below, it would be much faster if you just
>>> declared your variables as int and get rid of the casts.  All the casts are
>>> doing for you is forcing the compiler to explicitly truncate the results to
>>> 16 bits, which is not necessary.
>>>
>>> I am aware that it is stylish to throw in as many casts and consts as
>>> possible, but simple type safety is both faster and more readable.
>>>
>>> I don't mean to dump on your proposal, but if you're going to make a
>>> change, make a change worth doing.  I'm not a insisting that Firebird adopt
>>> value based encoding as that is a choice for the guys doing the
>>> implementing.  I did make the change from run length encoding to value
>>> based encoding in Netrastructure and found it reduced on-disk record sizes
>>> by 2/3.
>>>
>>> And, incidentally, the existing code that you deride as a hack is probably
>>> also my code, though probably reworked by half dozen folks over the years.
>>> Still, I would prefer the term "archaic historical artifact" to "hack" as
>>> it was written on a 1 MB Apollo DN 330 running a 68010, approximately the
>>> norm for workstations circa 1984.  Machines have changed since then, and
>>> with them, the tradeoffs.
>>>
>>> On Friday, February 27, 2015, Slavomir Skopalik <[email protected]>
>>> wrote:
>>>
>>>>   Hi Jim,
>>>> I don't tell your scheme hack, this is misunderstanding.
>>>> I tell, that current implementation of RLE in firebird is hack
>>>> (parsing RLE control stream outside compresor/decompresor in reverse
>>>> order).
>>>>
>>>> If I replace current RLE by anything, I have to do same/worst hack(s).
>>>> And I don't want to go this way (wastage time for bad implementation).
>>>> Please look in code first.
>>>>
>>>>
>>>> http://sourceforge.net/p/firebird/code/HEAD/tree/firebird/branches/B2_5_Release/src/jrd/dpm.epp
>>>>
>>>> // Move compressed data onto page
>>>>
>>>>         while (length > 1)
>>>>         {
>>>>             // Handle residual count, if any
>>>>             if (count > 0)
>>>>             {
>>>>                  const USHORT l = MIN((USHORT) count, length - 1);
>>>>                 USHORT n = l;
>>>>                 do {
>>>>                     *--out = *--in;
>>>>                 } while (--n);
>>>>                 *--out = l;
>>>>                 length -= (SSHORT) (l + 1);    // bytes remaining on page
>>>>                 count -= (SSHORT) l;    // bytes remaining in run
>>>>                 continue;
>>>>             }
>>>>
>>>>             if ((count = *--control) < 0)
>>>>             {
>>>>                 *--out = in[-1];
>>>>                 *--out = count;
>>>>                 in += count;
>>>>                 length -= 2;
>>>>             }
>>>>         }
>>>>
>>>>
>>>> As I wrote, it is imposible change encoding without refactoring current
>>>> code base.
>>>>
>>>> Slavek
>>>>
>>>> Ing. Slavomir Skopalik
>>>> Executive Head
>>>> Elekt Labs s.r.o.
>>>> Collection and evaluation of data from machines and laboratories
>>>> by means of system MASA (http://www.elektlabs.cz/m2demo)
>>>> -----------------------------------------------------------------
>>>> Address:
>>>> Elekt Labs s.r.o.
>>>> Chaloupky 158
>>>> 783 72 Velky Tynec
>>>> Czech Republic
>>>> ---------------------------------------------------------------
>>>> Mobile: +420 724 207 851
>>>> icq:199 118 333skype:skopalikse-mail:[email protected] 
>>>> <javascript:_e(%7B%7D,'cvml','e-mail:[email protected]');>http://www.elektlabs.cz
>>>>
>>>> On 28.2.2015 1:12, James Starkey wrote:
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, 
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>> all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> Firebird-Devel mailing list, web interface at 
>> https://lists.sourceforge.net/lists/listinfo/firebird-devel
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> Firebird-Devel mailing list, web interface at 
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
>



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] Recore level compresion imroovement

Reply via email to