> The problem is that currently there are three filters defined: > compression, > encryption, and sparse file handling. The current implementation of > compression and sparse file handling both require block boundary > preservation. Even if zlib streaming could handle the existing block > based > data, sparse file handling would be broken.
It seems to me that it is probably time to come up with a better way to handle filters, but it is probably too late for 1.40 to make any major changes to the code. I think the most important two points are: 1. Ensure that old Volumes are readable wherever possible. 2. Fix 1.40 so that it works correctly. As far as point 2 is concerned, if it is not possible to fix it easily or correctly, we could consider disallowing certain combinations of options -- at least until we can find a better way to handle multiple filters. > > -----Original Message----- > From: Landon Fuller [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 02, 2006 11:06 AM > To: Robert Nelson > Cc: 'Michael Brennen'; [EMAIL PROTECTED]; > bacula-users@lists.sourceforge.net > Subject: Re: [Bacula-users] Encryption/Compression Conflict in CVS > > > On Nov 2, 2006, at 08:30, Robert Nelson wrote: > >> Landon, >> >> I've changed the code so that the encryption code prefixes the data >> block with a block length prior to encryption. >> >> The decryption code accumulates data until a full data block is >> decrypted before passing it along to the decompression code. >> >> The code now works for all four scenarios with encryption and >> compression: >> none, encryption, compression, and encryption + compression. >> Unfortunately >> the code is no longer compatible for previously encrypted backups. >> >> I could add some more code to make the encryption only case work like >> before. However, since this is a new feature in 1.39 and there >> shouldn't be a lot of existing backups, I would prefer to invalidate >> the previous backups and keep the code simpler. >> >> Also I think we should have a design rule that says any data filters >> like encryption, compression, etc must maintain the original buffer >> boundaries. >> >> This will allow us to define arbitrary, dynamically extensible filter >> stacks in the future. >> >> What do you think? > > I was thinking about this on the way to work. My original assumption was > that Bacula used the zlib streaming API to maintain state during file > compression/decompression, but this is not the case. Reality is something > more like this: > > Backup: > - Set up the zlib stream context > - For each file block (not each file), compress the block via > deflate (stream, Z_FINISH); and reinitialize the stream. > - After all files (and blocks) are compressed, destroy the stream > context > > Restore: > - For each block, call "uncompress()", which does not handle > streaming. > > This is a unfortunate -- reinitializing the stream for each block > significantly degrades compression efficiency, as 1) block boundaries are > dynamic and may be set arbitrarily, 2) the LZ77 algorithm may cross block > boundaries, referring up to 32k of previous input data. > (http://www.gzip.org/zlib/rfc-deflate.html#overview), 3) The huffman > coding > context comprises the entire block, 4) There's no need to limit zlib block > size to bacula's block size. > > The next question is this -- as we *should* stream the data, does it make > sense to enforce downstream block boundaries in the upstream filter? I'm > siding in favor requiring streaming support, and thus allowing the > individual filter implementor to worry about their own block buffering, > since they can far better encapsulate necessary state and implementation > -- > and most already do. > > The one other thing I am unsure of is whether the zlib streaming API > correctly handles streams that have been written as per above -- each > bacula > data block as an independent 'stream'. If zlib DOES handle this, it should > be possible to modify the backup and restore implementation to use the > stream API correctly while maintaining backwards compatibility. This would > fix the encryption problem AND increase compression efficiency. > > With my extremely large database backups, I sure wouldn't mind increased > compression efficiency =) > > Some documentation on the zlib API is available here (I had a little > difficulty googling this): > > http://www.freestandards.org/spec/booksets/LSB-Core-generic/LSB-Core-generic > /libzman.html > > Cheers, > Landon > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Bacula-devel mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/bacula-devel > Best regards, Kern ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users