The problem is that currently there are three filters defined: compression, encryption, and sparse file handling. The current implementation of compression and sparse file handling both require block boundary preservation. Even if zlib streaming could handle the existing block based data, sparse file handling would be broken.
-----Original Message----- From: Landon Fuller [mailto:[EMAIL PROTECTED] Sent: Thursday, November 02, 2006 11:06 AM To: Robert Nelson Cc: 'Michael Brennen'; [EMAIL PROTECTED]; bacula-users@lists.sourceforge.net Subject: Re: [Bacula-users] Encryption/Compression Conflict in CVS On Nov 2, 2006, at 08:30, Robert Nelson wrote: > Landon, > > I've changed the code so that the encryption code prefixes the data > block with a block length prior to encryption. > > The decryption code accumulates data until a full data block is > decrypted before passing it along to the decompression code. > > The code now works for all four scenarios with encryption and > compression: > none, encryption, compression, and encryption + compression. > Unfortunately > the code is no longer compatible for previously encrypted backups. > > I could add some more code to make the encryption only case work like > before. However, since this is a new feature in 1.39 and there > shouldn't be a lot of existing backups, I would prefer to invalidate > the previous backups and keep the code simpler. > > Also I think we should have a design rule that says any data filters > like encryption, compression, etc must maintain the original buffer > boundaries. > > This will allow us to define arbitrary, dynamically extensible filter > stacks in the future. > > What do you think? I was thinking about this on the way to work. My original assumption was that Bacula used the zlib streaming API to maintain state during file compression/decompression, but this is not the case. Reality is something more like this: Backup: - Set up the zlib stream context - For each file block (not each file), compress the block via deflate (stream, Z_FINISH); and reinitialize the stream. - After all files (and blocks) are compressed, destroy the stream context Restore: - For each block, call "uncompress()", which does not handle streaming. This is a unfortunate -- reinitializing the stream for each block significantly degrades compression efficiency, as 1) block boundaries are dynamic and may be set arbitrarily, 2) the LZ77 algorithm may cross block boundaries, referring up to 32k of previous input data. (http://www.gzip.org/zlib/rfc-deflate.html#overview), 3) The huffman coding context comprises the entire block, 4) There's no need to limit zlib block size to bacula's block size. The next question is this -- as we *should* stream the data, does it make sense to enforce downstream block boundaries in the upstream filter? I'm siding in favor requiring streaming support, and thus allowing the individual filter implementor to worry about their own block buffering, since they can far better encapsulate necessary state and implementation -- and most already do. The one other thing I am unsure of is whether the zlib streaming API correctly handles streams that have been written as per above -- each bacula data block as an independent 'stream'. If zlib DOES handle this, it should be possible to modify the backup and restore implementation to use the stream API correctly while maintaining backwards compatibility. This would fix the encryption problem AND increase compression efficiency. With my extremely large database backups, I sure wouldn't mind increased compression efficiency =) Some documentation on the zlib API is available here (I had a little difficulty googling this): http://www.freestandards.org/spec/booksets/LSB-Core-generic/LSB-Core-generic /libzman.html Cheers, Landon ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users