* Daniel Cheng <[EMAIL PROTECTED]> [2008-10-24 06:44:38]: > On Thu, Oct 23, 2008 at 9:51 PM, Florent Daigniere > <[EMAIL PROTECTED]> wrote: > > Daniel Cheng wrote: > >> On Thu, Oct 23, 2008 at 9:02 PM, Florent Daigniere > >> <[EMAIL PROTECTED]> wrote: > >>> Florent Daigniere wrote: > >>>> Matthew Toseland wrote: > >>>>> On Thursday 23 October 2008 10:39, NextGen$ wrote: > >>>>>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]: > >>>>>> > >>>>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]> > >>>>> wrote: > >>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]: > >>>>>>>> > >>>>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote: > >>>>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]: > >>>>>>>>>> > >>>>>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED] > >>>>> wrote: > >>>>>>>>>>>> Author: nextgens > >>>>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008) > >>>>>>>>>>>> New Revision: 23014 > >>>>>>>>>>>> > >>>>>>>>>>>> Modified: > >>>>>>>>>>>> trunk/freenet/src/freenet/client/ArchiveManager.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/ArchiveStoreContext.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/ClientMetadata.java > >>>>>>>>>>>> > >>>>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/Metadata.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/async/ClientPutter.java > >>>>>>>>>>>> > >>>>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileFetcher.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileInserter.java > >>>>>>>>>>>> trunk/freenet/src/freenet/client/async/SplitFileInserter.java > >>>>>>>>>>>> trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java > >>>>>>>>>>>> trunk/freenet/src/freenet/frost/message/FrostMessage.java > >>>>>>>>>>>> trunk/freenet/src/freenet/node/NodeARKInserter.java > >>>>>>>>>>>> trunk/freenet/src/freenet/node/TextModeClientInterface.java > >>>>>>>>>>>> trunk/freenet/src/freenet/node/fcp/ClientPut.java > >>>>>>>>>>>> trunk/freenet/src/freenet/node/fcp/DirPutFile.java > >>>>>>>>>>>> > >>>>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java > >>>>>>>>>>>> Log: > >>>>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! *** > >>>>>>>>>>>> It's still not backward compatible with stable but should be > >>>>>>>>>>> forward-compatible ;) > >>>>>>>>>> [...] see r23023 > >>>>>>>>>> > >>>>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip > >>>>>>>>>>> now? > >>>>>>>>> Shouldn't > >>>>>>>>>>> there be a max size configuration above which we don't try bzip2, > >>>>> perhaps > >>>>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long > >>>>> time ... > >>>>>>>>>> I don't think we need one. Big files will take long to compress but > >>>>> will > >>>>>>>>> take > >>>>>>>>>> long to insert too. I think it's worth spending a few more CPU > >>>>>>>>>> cycles > >>>>> to > >>>>>>>>>> spare the insertion of a few blocks (plus their FEC blocks). > >>>>>>>>> I'm not convinced that this is acceptable from a usability point of > >>>>> view. > >>>>>>>>> Maybe we can provide a progress bar within the compression phase? On > >>>>> the new > >>>>>>>>> UI it is proposed to separate downloads which are not yet finalised > >>>>> (i.e. > >>>>>>>>> haven't fetched the last lot of metadata) from downloads that > >>>>>>>>> are... we > >>>>> could > >>>>>>>>> do something similar with inserts in compression. > >>>>>>>>> > >>>>>>>> Have a look to what I have commited. From now on the compression is > >>>>> fully > >>>>>>>> serialized... We have one mutex, and only one compression job (just > >>>>>>>> like > >>>>> we > >>>>>>>> do for FEC encoding in fact) which means a even higher latency. > >>>>>>> It is feasible to insert some blocks of data while compressing? > >>>>>>> Gzip, bzip2 and lzma all support streams. We can collect the output > >>>>>>> data > >>>>>>> as we feed data to them. > >>>>>>> > >>>>>> Right now we attempt to compress the full data using all the > >>>>>> compression > >>>>>> algorithms and we keep the smallest resulting bucket. How do you plan > >>>>>> to > >>>>>> chose the best-performing algorithm before actually compressing the > >>>>>> data? > >>>>>> > >>>>>> I don't think that we can evaluate how well algorithms compress over a > >>>>> single > >>>>>> segment: it's just too small. > >>>>>> > >>>>>>> As soon as we get enough compressed data for FEC, we can insert them. > >>>>>>> This would be a great preformance improvement for large file on SMP. > >>>>>>> > >>>>>> That would involve rewritting most of the client-layer. > >>>>>> > >>>>>>> It this doable without changing the data format? > >>>>>>> > >>>>>> It's not about the data format; we insert the manifest at the end > >>>>>> unless not > >>>>>> told to by the earlyEncode parameter. > >>>>>> > >>>>>> IMHO we are debating for no real reason here: the real-time taken by > >>>>>> the > >>>>>> compression phase is insignificant compared to the time taken by the > >>>>>> insertion process. Sure, trunk will take at least 3 times longer than > >>>>> current > >>>>>> stable before it starts inserting anything; but is that a big deal? > >>>>>> You will > >>>>>> need real numbers to convince me here. > >>>>> I'd like some numbers ... iirc it takes around 2 days to insert a > >>>>> CD-sized > >>>>> ISO? How long does it take to bzip2 it? > >>>>> > >>>> It obviously depends on various factors including how fast you can do > >>>> I/Os, the block size and the number of cores you have. > >>>> > >>>> Here on what is likely to be "the worst case scenario": > >>>> $time bzip2 -c iso > iso.bz2|grep real > >>>> real 3m57552s > >>>> $time gzip -c iso > iso.gz|grep real > >>>> real 0m46.079s > >>>> $du -hs iso* > >>>> 560M iso > >>>> 506M iso.bz2 > >>>> 506M iso.gz > >>>> > >>>> There is no clear gain to bzip the content... but compression is worth > >>>> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! > >>>> Now if you tell me that freenet is able to insert 108MB of data in less > >>>> than 5mins, I will consider optimizing the compression step. > >>>> > >>>> They are solutions for guesstimating the efficiency of a given > >>>> compression algorithm but I am not sure they are worth implementing. > >>>> > >>> Here is some more representative data on a dual-core system: > >>> > >>> real 24m55.472s > >>> user 23m4.947s > >>> sys 0m10.633s > >>> 1884544 iso.lzma > >>> > >>> real 13m32.442s > >>> user 12m6.937s > >>> sys 0m7.784s > >>> 1934324 iso.bz2 > >>> My implementation of BZIP2 uses only one of the two cores > >>> > >>> real 3m19.066s > >>> user 2m11.332s > >>> sys 0m6.284s > >>> 1935056 iso.gz > >>> > >>> And the original : > >>> 2026416 iso > >>> > >>> So, we have: > >>> 63325 blocks for the original > >>> 60470 blocks with GZIP (4.5% gain) > >>> 60447 blocks with BZIP2 (4.5% gain) > >>> 58892 blocks with LZMA (7% gain) > >> > >> The compression rate on "ISO" is not realistic. ISO is an uncompressed > >> format. > >> The actual compression rate depends on the data on the CD / DVD disk. > >> > > > > I think that the data I used is representative of what the real users > > are likely to be dealing with. Bring your own stats if you are not happy > > with mine. > > The ISO file is Office 2007 DVD (Part number SKU-79G-00069, you can > reproduce yourself) > > Uncompressed: 1032732672 bytes > bzip2 -9 (best compression) : 996222481 bytes > gzip -6 (*default* compression) : 994112100 bytes > lzma -7 (*default* compression) : 990592948 bytes >
It's still achieving a 3.5% compression ratio in the worst case scenario! That's a *lot* of blocks. > See how it backfire with bzip2 :) What's your point? Some of the content might already be bzip2 encoded; that would explain the poor performance of the algorithm... In any case we are trying all of them ;)
signature.asc
Description: Digital signature
_______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
