Re: [Firebird-devel] Recore level compresion imroovement

Slavomir Skopalik Sat, 28 Feb 2015 06:32:41 -0800

Hi Jim,
I don't want to change ODS for saving one byte per page.
I want to change sources to be able implement different
encoder (put name that you want) -> change ODS.


For some encoder is frangmentation lost 1-2 byte, for another
can be more.
For some encoder is easy to do reverse parsing, for some other
is much more complicated.

For some situation can be generation of control stream benefit,
but as is now in sources (FB2.5, FB3) that I read, it is not.

Current compressor interface:
to create control stream:
ULONG SQZ_length(const SCHAR* data, ULONG length, DataComprControl* dcc)

to create final stream from control stream:
void SQZ_fast(const DataComprControl* dcc, const SCHAR* input, SCHAR* 
output)

To calculate how many bytes can be commpressed into small area (from 
control stream):
USHORT SQZ_compress_length(const DataComprControl* dcc, const SCHAR* 
input, int space)

To compress into small area:
USHORT SQZ_compress(const DataComprControl* dcc, const SCHAR* input, 
SCHAR* output, int space)

and decomress:
UCHAR* SQZ_decompress(const UCHAR*    input,  USHORT        length,   
UCHAR*        output,   const UCHAR* const    output_end)

And some routines is directly in storage code.

In FB3 is very similar (changed names, organized into class, same hack 
in store_big_record(problem is not code itself, but where the code is)).

The question is:
Why keep control stream (worst CPU, litle worst HDD, and also important 
for me - worst readable code)?
It seems to be, that was implemented this way because of RAM limitation.

And another question:
What functions and parameters have been in new interface?

If you have idea how to use control stream with benefits, please share it.

Slavek

BTW: If we drop control stream, posted code will reduce to one movecpy 
that is implemented by SSE+ instructions.


On 28.2.2015 14:16, James Starkey wrote:
> I regret both that I don't have a copy of Firebird source on the boat or
> access to adequate bandwidth to get it, so I'm not in a position to comment
> on tge existing code one way or another.  But as I understand your
> proposal, you are suggestion the the ODS be changed to save (at most) one
> byte per 4,050 bytes (approximately) of very large fragmented record.  That
> isn't much of a payback.
>
> But looking at your code below, it would be much faster if you just
> declared your variables as int and get rid of the casts.  All the casts are
> doing for you is forcing the compiler to explicitly truncate the results to
> 16 bits, which is not necessary.
>
> I am aware that it is stylish to throw in as many casts and consts as
> possible, but simple type safety is both faster and more readable.
>
> I don't mean to dump on your proposal, but if you're going to make a
> change, make a change worth doing.  I'm not a insisting that Firebird adopt
> value based encoding as that is a choice for the guys doing the
> implementing.  I did make the change from run length encoding to value
> based encoding in Netrastructure and found it reduced on-disk record sizes
> by 2/3.
>
> And, incidentally, the existing code that you deride as a hack is probably
> also my code, though probably reworked by half dozen folks over the years.
> Still, I would prefer the term "archaic historical artifact" to "hack" as
> it was written on a 1 MB Apollo DN 330 running a 68010, approximately the
> norm for workstations circa 1984.  Machines have changed since then, and
> with them, the tradeoffs.
>
> On Friday, February 27, 2015, Slavomir Skopalik <skopa...@elektlabs.cz>
> wrote:
>
>>   Hi Jim,
>> I don't tell your scheme hack, this is misunderstanding.
>> I tell, that current implementation of RLE in firebird is hack
>> (parsing RLE control stream outside compresor/decompresor in reverse
>> order).
>>
>> If I replace current RLE by anything, I have to do same/worst hack(s).
>> And I don't want to go this way (wastage time for bad implementation).
>> Please look in code first.
>>
>>
>> http://sourceforge.net/p/firebird/code/HEAD/tree/firebird/branches/B2_5_Release/src/jrd/dpm.epp
>>
>> // Move compressed data onto page
>>
>>              while (length > 1)
>>              {
>>                      // Handle residual count, if any
>>                      if (count > 0)
>>                      {
>>                  const USHORT l = MIN((USHORT) count, length - 1);
>>                              USHORT n = l;
>>                              do {
>>                                      *--out = *--in;
>>                              } while (--n);
>>                              *--out = l;
>>                              length -= (SSHORT) (l + 1);     // bytes 
>> remaining on page
>>                              count -= (SSHORT) l;    // bytes remaining in 
>> run
>>                              continue;
>>                      }
>>
>>                      if ((count = *--control) < 0)
>>                      {
>>                              *--out = in[-1];
>>                              *--out = count;
>>                              in += count;
>>                              length -= 2;
>>                      }
>>              }
>>
>>
>> As I wrote, it is imposible change encoding without refactoring current
>> code base.
>>
>> Slavek
>>
>> Ing. Slavomir Skopalik
>> Executive Head
>> Elekt Labs s.r.o.
>> Collection and evaluation of data from machines and laboratories
>> by means of system MASA (http://www.elektlabs.cz/m2demo)
>> -----------------------------------------------------------------
>> Address:
>> Elekt Labs s.r.o.
>> Chaloupky 158
>> 783 72 Velky Tynec
>> Czech Republic
>> ---------------------------------------------------------------
>> Mobile: +420 724 207 851
>> icq:199 118 333skype:skopalikse-mail:skopa...@elektlabs.cz 
>> <javascript:_e(%7B%7D,'cvml','e-mail:skopa...@elektlabs.cz');>http://www.elektlabs.cz
>>
>> On 28.2.2015 1:12, James Starkey wrote:
>>
>>



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] Recore level compresion imroovement

Reply via email to