On Mon, Dec 14, 2009 at 5:25 AM, Florent Daigniere <nextgens at freenetproject.org> wrote: > Attempting to compress the file with the same compression algorithm is likely > to be fruitless, yes... I had a patch somewhere which was trying to > use file extensions to make educated guesses... but it never got merged > because of conflicts (saces was working on metadatas) and lack of interrest > on my side. > > Anyway, how do you determine if a file is already compressed or not without > actually compressing it? Did you do the maths? In most cases, even though the > data is already compressed it does make sense to recompress it with another > algorithm (walltime-wise) before sending it over the (slow) wire.
I think by looking at the filetype you can make an educated guess. Also, if the file is larger than 1MB, there is a good chance that its already been compressed. I don't think you'll gain anything by re-compressing an already compressed file unless the original compression mechanism was really dumb. > Iirc the node uses GZIP,BZIP2 and LZMA and inserts the smallest resulting > file. At some point I even wanted to implement other algorithms like LZO and > PAQ8P. After all, all we are talking about here is wasting some niced CPU > cycles to earn both insert and download time! Yeah, but with a 1GB file, this compression takes a *long* time, and in the case of the vast majority of 1GB files, will be completely fruitless because they'll already be compressed. Ian. -- Ian Clarke CEO, Uprizer Labs Email: ian at uprizer.com Ph: +1 512 422 3588
