Matthew Toseland wrote: > On Thursday 23 October 2008 10:39, NextGen$ wrote: >> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]: >> >>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]> > wrote: >>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]: >>>> >>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote: >>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]: >>>>>> >>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED] > wrote: >>>>>>>> Author: nextgens >>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008) >>>>>>>> New Revision: 23014 >>>>>>>> >>>>>>>> Modified: >>>>>>>> trunk/freenet/src/freenet/client/ArchiveManager.java >>>>>>>> trunk/freenet/src/freenet/client/ArchiveStoreContext.java >>>>>>>> trunk/freenet/src/freenet/client/ClientMetadata.java >>>>>>>> > trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java >>>>>>>> trunk/freenet/src/freenet/client/Metadata.java >>>>>>>> trunk/freenet/src/freenet/client/async/ClientPutter.java >>>>>>>> > trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java >>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileFetcher.java >>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileInserter.java >>>>>>>> trunk/freenet/src/freenet/client/async/SplitFileInserter.java >>>>>>>> trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java >>>>>>>> trunk/freenet/src/freenet/frost/message/FrostMessage.java >>>>>>>> trunk/freenet/src/freenet/node/NodeARKInserter.java >>>>>>>> trunk/freenet/src/freenet/node/TextModeClientInterface.java >>>>>>>> trunk/freenet/src/freenet/node/fcp/ClientPut.java >>>>>>>> trunk/freenet/src/freenet/node/fcp/DirPutFile.java >>>>>>>> > trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java >>>>>>>> Log: >>>>>>>> more work on bug #71: *** IT NEEDS TESTING! *** >>>>>>>> It's still not backward compatible with stable but should be >>>>>>> forward-compatible ;) >>>>>> >>>>>> [...] see r23023 >>>>>> >>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now? >>>>> Shouldn't >>>>>>> there be a max size configuration above which we don't try bzip2, > perhaps >>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long > time ... >>>>>> I don't think we need one. Big files will take long to compress but > will >>>>> take >>>>>> long to insert too. I think it's worth spending a few more CPU cycles > to >>>>>> spare the insertion of a few blocks (plus their FEC blocks). >>>>> I'm not convinced that this is acceptable from a usability point of > view. >>>>> Maybe we can provide a progress bar within the compression phase? On > the new >>>>> UI it is proposed to separate downloads which are not yet finalised > (i.e. >>>>> haven't fetched the last lot of metadata) from downloads that are... we > could >>>>> do something similar with inserts in compression. >>>>> >>>> Have a look to what I have commited. From now on the compression is > fully >>>> serialized... We have one mutex, and only one compression job (just like > we >>>> do for FEC encoding in fact) which means a even higher latency. >>> It is feasible to insert some blocks of data while compressing? >>> Gzip, bzip2 and lzma all support streams. We can collect the output data >>> as we feed data to them. >>> >> Right now we attempt to compress the full data using all the compression >> algorithms and we keep the smallest resulting bucket. How do you plan to >> chose the best-performing algorithm before actually compressing the data? >> >> I don't think that we can evaluate how well algorithms compress over a > single >> segment: it's just too small. >> >>> As soon as we get enough compressed data for FEC, we can insert them. >>> This would be a great preformance improvement for large file on SMP. >>> >> That would involve rewritting most of the client-layer. >> >>> It this doable without changing the data format? >>> >> It's not about the data format; we insert the manifest at the end unless not >> told to by the earlyEncode parameter. >> >> IMHO we are debating for no real reason here: the real-time taken by the >> compression phase is insignificant compared to the time taken by the >> insertion process. Sure, trunk will take at least 3 times longer than > current >> stable before it starts inserting anything; but is that a big deal? You will >> need real numbers to convince me here. > > I'd like some numbers ... iirc it takes around 2 days to insert a CD-sized > ISO? How long does it take to bzip2 it? >
It obviously depends on various factors including how fast you can do I/Os, the block size and the number of cores you have. Here on what is likely to be "the worst case scenario": $time bzip2 -c iso > iso.bz2|grep real real 3m57552s $time gzip -c iso > iso.gz|grep real real 0m46.079s $du -hs iso* 560M iso 506M iso.bz2 506M iso.gz There is no clear gain to bzip the content... but compression is worth it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! Now if you tell me that freenet is able to insert 108MB of data in less than 5mins, I will consider optimizing the compression step. They are solutions for guesstimating the efficiency of a given compression algorithm but I am not sure they are worth implementing. > Also from a usability point of view, having Freenet apparently not doing > anything with an insert for hours is *bad*. We will need a compression > progress monitor. > I don't think that compression is any slower than FEC encoding which is already serialized... and so far we didn't tell the user about it. Fill in a feature request on mantis. _______________________________________________ Devl mailing list [email protected] http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
