Re: [Bacula-users] [Bacula-devel] Encryption/Compression Conflict in CVS

Kern Sibbald Fri, 03 Nov 2006 16:23:08 -0800

> The problem is that currently there are three filters defined:
> compression,
> encryption, and sparse file handling.  The current implementation of
> compression and sparse file handling both require block boundary
> preservation.  Even if zlib streaming could handle the existing block
> based
> data, sparse file handling would be broken.


It seems to me that it is probably time to come up with a better way to
handle filters, but it is probably too late for 1.40 to make any major
changes to the code.

I think the most important two points are:
1. Ensure that old Volumes are readable wherever possible.
2. Fix 1.40 so that it works correctly.

As far as point 2 is concerned, if it is not possible to fix it easily or
correctly, we could consider disallowing certain combinations of options
-- at least until we can find a better way to handle multiple filters.

>
> -----Original Message-----
> From: Landon Fuller [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 02, 2006 11:06 AM
> To: Robert Nelson
> Cc: 'Michael Brennen'; [EMAIL PROTECTED];
> bacula-users@lists.sourceforge.net
> Subject: Re: [Bacula-users] Encryption/Compression Conflict in CVS
>
>
> On Nov 2, 2006, at 08:30, Robert Nelson wrote:
>
>> Landon,
>>
>> I've changed the code so that the encryption code prefixes the data
>> block with a block length prior to encryption.
>>
>> The decryption code accumulates data until a full data block is
>> decrypted before passing it along to the decompression code.
>>
>> The code now works for all four scenarios with encryption and
>> compression:
>> none, encryption, compression, and encryption + compression.
>> Unfortunately
>> the code is no longer compatible for previously encrypted backups.
>>
>> I could add some more code to make the encryption only case work like
>> before.  However, since this is a new feature in 1.39 and there
>> shouldn't be a lot of existing backups, I would prefer to invalidate
>> the previous backups and keep the code simpler.
>>
>> Also I think we should have a design rule that says any data filters
>> like encryption, compression, etc must maintain the original buffer
>> boundaries.
>>
>> This will allow us to define arbitrary, dynamically extensible filter
>> stacks in the future.
>>
>> What do you think?
>
> I was thinking about this on the way to work. My original assumption was
> that Bacula used the zlib streaming API to maintain state during file
> compression/decompression, but this is not the case. Reality is something
> more like this:
>
> Backup:
>       - Set up the zlib stream context
>       - For each file block (not each file), compress the block via
> deflate (stream, Z_FINISH); and reinitialize the stream.
>       - After all files (and blocks) are compressed, destroy the stream
> context
>
> Restore:
>       - For each block, call "uncompress()", which does not handle
> streaming.
>
> This is a unfortunate -- reinitializing the stream for each block
> significantly degrades compression efficiency, as 1) block boundaries are
> dynamic and may be set arbitrarily, 2) the LZ77 algorithm may cross block
> boundaries, referring up to 32k of previous input data.
> (http://www.gzip.org/zlib/rfc-deflate.html#overview), 3) The huffman
> coding
> context comprises the entire block, 4) There's no need to limit zlib block
> size to bacula's block size.
>
> The next question is this -- as we *should* stream the data, does it make
> sense to enforce downstream block boundaries in the upstream filter? I'm
> siding in favor requiring streaming support, and thus allowing the
> individual filter implementor to worry about their own block buffering,
> since they can far better encapsulate necessary state and implementation
> --
> and most already do.
>
> The one other thing I am unsure of is whether the zlib streaming API
> correctly handles streams that have been written as per above -- each
> bacula
> data block as an independent 'stream'. If zlib DOES handle this, it should
> be possible to modify the backup and restore implementation to use the
> stream API correctly while maintaining backwards compatibility. This would
> fix the encryption problem AND increase compression efficiency.
>
> With my extremely large database backups, I sure wouldn't mind increased
> compression efficiency =)
>
> Some documentation on the zlib API is available here (I had a little
> difficulty googling this):
>
> http://www.freestandards.org/spec/booksets/LSB-Core-generic/LSB-Core-generic
> /libzman.html
>
> Cheers,
> Landon
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Bacula-devel mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>


Best regards, Kern

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] [Bacula-devel] Encryption/Compression Conflict in CVS

Reply via email to