On Thu, Oct 23, 2008 at 9:02 PM, Florent Daigniere <[EMAIL PROTECTED]> wrote: > Florent Daigniere wrote: >> Matthew Toseland wrote: >>> On Thursday 23 October 2008 10:39, NextGen$ wrote: >>>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]: >>>> >>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]> >>> wrote: >>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]: >>>>>> >>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote: >>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]: >>>>>>>> >>>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED] >>> wrote: >>>>>>>>>> Author: nextgens >>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008) >>>>>>>>>> New Revision: 23014 >>>>>>>>>> >>>>>>>>>> Modified: >>>>>>>>>> trunk/freenet/src/freenet/client/ArchiveManager.java >>>>>>>>>> trunk/freenet/src/freenet/client/ArchiveStoreContext.java >>>>>>>>>> trunk/freenet/src/freenet/client/ClientMetadata.java >>>>>>>>>> >>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java >>>>>>>>>> trunk/freenet/src/freenet/client/Metadata.java >>>>>>>>>> trunk/freenet/src/freenet/client/async/ClientPutter.java >>>>>>>>>> >>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java >>>>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileFetcher.java >>>>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileInserter.java >>>>>>>>>> trunk/freenet/src/freenet/client/async/SplitFileInserter.java >>>>>>>>>> trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java >>>>>>>>>> trunk/freenet/src/freenet/frost/message/FrostMessage.java >>>>>>>>>> trunk/freenet/src/freenet/node/NodeARKInserter.java >>>>>>>>>> trunk/freenet/src/freenet/node/TextModeClientInterface.java >>>>>>>>>> trunk/freenet/src/freenet/node/fcp/ClientPut.java >>>>>>>>>> trunk/freenet/src/freenet/node/fcp/DirPutFile.java >>>>>>>>>> >>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java >>>>>>>>>> Log: >>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! *** >>>>>>>>>> It's still not backward compatible with stable but should be >>>>>>>>> forward-compatible ;) >>>>>>>> [...] see r23023 >>>>>>>> >>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now? >>>>>>> Shouldn't >>>>>>>>> there be a max size configuration above which we don't try bzip2, >>> perhaps >>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long >>> time ... >>>>>>>> I don't think we need one. Big files will take long to compress but >>> will >>>>>>> take >>>>>>>> long to insert too. I think it's worth spending a few more CPU cycles >>> to >>>>>>>> spare the insertion of a few blocks (plus their FEC blocks). >>>>>>> I'm not convinced that this is acceptable from a usability point of >>> view. >>>>>>> Maybe we can provide a progress bar within the compression phase? On >>> the new >>>>>>> UI it is proposed to separate downloads which are not yet finalised >>> (i.e. >>>>>>> haven't fetched the last lot of metadata) from downloads that are... we >>> could >>>>>>> do something similar with inserts in compression. >>>>>>> >>>>>> Have a look to what I have commited. From now on the compression is >>> fully >>>>>> serialized... We have one mutex, and only one compression job (just like >>> we >>>>>> do for FEC encoding in fact) which means a even higher latency. >>>>> It is feasible to insert some blocks of data while compressing? >>>>> Gzip, bzip2 and lzma all support streams. We can collect the output data >>>>> as we feed data to them. >>>>> >>>> Right now we attempt to compress the full data using all the compression >>>> algorithms and we keep the smallest resulting bucket. How do you plan to >>>> chose the best-performing algorithm before actually compressing the data? >>>> >>>> I don't think that we can evaluate how well algorithms compress over a >>> single >>>> segment: it's just too small. >>>> >>>>> As soon as we get enough compressed data for FEC, we can insert them. >>>>> This would be a great preformance improvement for large file on SMP. >>>>> >>>> That would involve rewritting most of the client-layer. >>>> >>>>> It this doable without changing the data format? >>>>> >>>> It's not about the data format; we insert the manifest at the end unless >>>> not >>>> told to by the earlyEncode parameter. >>>> >>>> IMHO we are debating for no real reason here: the real-time taken by the >>>> compression phase is insignificant compared to the time taken by the >>>> insertion process. Sure, trunk will take at least 3 times longer than >>> current >>>> stable before it starts inserting anything; but is that a big deal? You >>>> will >>>> need real numbers to convince me here. >>> I'd like some numbers ... iirc it takes around 2 days to insert a CD-sized >>> ISO? How long does it take to bzip2 it? >>> >> >> It obviously depends on various factors including how fast you can do >> I/Os, the block size and the number of cores you have. >> >> Here on what is likely to be "the worst case scenario": >> $time bzip2 -c iso > iso.bz2|grep real >> real 3m57552s >> $time gzip -c iso > iso.gz|grep real >> real 0m46.079s >> $du -hs iso* >> 560M iso >> 506M iso.bz2 >> 506M iso.gz >> >> There is no clear gain to bzip the content... but compression is worth >> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! >> Now if you tell me that freenet is able to insert 108MB of data in less >> than 5mins, I will consider optimizing the compression step. >> >> They are solutions for guesstimating the efficiency of a given >> compression algorithm but I am not sure they are worth implementing. >> > > Here is some more representative data on a dual-core system: > > real 24m55.472s > user 23m4.947s > sys 0m10.633s > 1884544 iso.lzma > > real 13m32.442s > user 12m6.937s > sys 0m7.784s > 1934324 iso.bz2 > My implementation of BZIP2 uses only one of the two cores > > real 3m19.066s > user 2m11.332s > sys 0m6.284s > 1935056 iso.gz > > And the original : > 2026416 iso > > So, we have: > 63325 blocks for the original > 60470 blocks with GZIP (4.5% gain) > 60447 blocks with BZIP2 (4.5% gain) > 58892 blocks with LZMA (7% gain)
The compression rate on "ISO" is not realistic. ISO is an uncompressed format. The actual compression rate depends on the data on the CD / DVD disk. > Of course those don't include the FEC blocks: So to sum up, yes I think > it's worth spending half an hour of CPU time to "win" 4433*2=8866 > blocks. And that's still true on a single core system where we would > spend 1 hour. _______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
