On Friday 01 August 2003 10:47 am, Gordan wrote:
> On Friday 01 Aug 2003 16:32, Tom Kaitchuck wrote:
> > To clarify things, I arrived at my numbers by taking a hypothetical size
> > index between 1 and 2 units. For the values between 1 and 1 + 1/8 zips
> > would compress the file into a file of size .25 and above that could go
> > to size .5. For Bzips, anything below 1.5 would go to .25 and everything
> > above would go to .5. So, assuming random samples: Bzip results in one
> > notch improvement 3/8Th's of the time. Zips will use ~23% the bandwidth
> > of uncompressed files, and bzips use 18.75% as much. (or 80% as much as
> > the zips.)
>
> OK, I see how you reached the figures.
>
> There is another potential issue. The compression we are discussiong here
> is per file, not for a whole archive. ZIP performaed worse on the archive
> because it compresses each file separately, the archives them all together.
> This is inherently less efficient, and it skewed your test somewhat in
> favour of gzip/bzip2.
>
> The fairer test would be to toar the entire directory and the zip -9 that.
> What is the difference then? The most relevant test would be to actually
> compress each file separately in every case, and see the relative
> compression ratios on all files. That is what is going to be happening in
> the node.
>
> I think the edge of bzip2 may go down in that case, but I am not sure by
> how much.

OK, the tar.zip archive is 527598 this means a compression ratio of 4.86 to 1. 
This is much better. (The difference between the two is almost half of what 
it was.) Interestingly doing a 'zip -0 |zip -9' gave me worse results than 
normal zipping.

Also I tried compression all the files first and then archiving them together. 
No matter what algorithm I used to do this is was always worse than zip -6! 
This means that if you are compression first using bzip, and then archiving, 
as you have stated was your intention, it is actually much worse than just 
zipping the file.

Given that: 
Our primary goal should be insuring that everything is compressed. 
It would waste a lot of time and energy to have multiple ways of doing 
archives.
Archiving first and then compressing is always better than the other way 
around.
Archives should not be done at the FRED level.

So, to insure that everything is compressed, their must be some sort of 
compression in Fred. To insure that we only are using one method for archives 
the method used by Fred should be the same as the one used by the insertion 
utility or the insertion utility should just archive and then let Fred 
compress it. 

So there are three possible solutions.

A: Have the insertion utilities archive and compress by zipping. Have Fred 
compress by zipping. Yields good compression. Requires no major changes to 
Fred. Requires no changes to FProxy.
B: Have the insertion utility archive by taring (no compression). Have Fred 
compress by zipping. Yields better compresion ratios. Requires no major 
changes to Fred. FProxy needs to be able to read tars.
C: Have the insertion utility archive by taring (no compression). Have Fred 
compress by Bzipping. Yields best compression ratios. Requires adding Bziping 
compressing/decompressing code into Fred. FProxy needs to be able to read 
tars.

OK, does that some things up? Now everyone vote and end this infernal thread.
I vote for B.
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to