On Thursday 23 October 2008 14:02, Florent Daigniere wrote:
> Florent Daigniere wrote:
> > Matthew Toseland wrote:
> >> On Thursday 23 October 2008 10:39, NextGen$ wrote:
> >>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]:
> >>>
> >>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]> 
> >> wrote:
> >>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]:
> >>>>>
> >>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
> >>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 
20:53:51]:
> >>>>>>>
> >>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED] 
> >> wrote:
> >>>>>>>>> Author: nextgens
> >>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
> >>>>>>>>> New Revision: 23014
> >>>>>>>>>
> >>>>>>>>> Modified:
> >>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
> >>>>>>>>>    
> >> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
> >>>>>>>>>    
> >> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
> >>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
> >>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
> >>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
> >>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
> >>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
> >>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
> >>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
> >>>>>>>>>    
> >> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
> >>>>>>>>> Log:
> >>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
> >>>>>>>>> It's still not backward compatible with stable but should be
> >>>>>>>> forward-compatible ;)
> >>>>>>> [...] see r23023
> >>>>>>>
> >>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now?
> >>>>>> Shouldn't
> >>>>>>>> there be a max size configuration above which we don't try bzip2, 
> >> perhaps
> >>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long 
> >> time ...
> >>>>>>> I don't think we need one. Big files will take long to compress but 
> >> will
> >>>>>> take
> >>>>>>> long to insert too. I think it's worth spending a few more CPU 
cycles 
> >> to
> >>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
> >>>>>> I'm not convinced that this is acceptable from a usability point of 
> >> view.
> >>>>>> Maybe we can provide a progress bar within the compression phase? On 
> >> the new
> >>>>>> UI it is proposed to separate downloads which are not yet finalised 
> >> (i.e.
> >>>>>> haven't fetched the last lot of metadata) from downloads that are... 
we 
> >> could
> >>>>>> do something similar with inserts in compression.
> >>>>>>
> >>>>> Have a look to what I have commited. From now on the compression is 
> >> fully
> >>>>> serialized... We have one mutex, and only one compression job (just 
like 
> >> we
> >>>>> do for FEC encoding in fact) which means a even higher latency.
> >>>> It is feasible to insert some blocks of data while compressing?
> >>>> Gzip, bzip2 and lzma all support streams. We can collect the output 
data
> >>>> as we feed data to them.
> >>>>
> >>> Right now we attempt to compress the full data using all the compression
> >>> algorithms and we keep the smallest resulting bucket. How do you plan to
> >>> chose the best-performing algorithm before actually compressing the 
data?
> >>>
> >>> I don't think that we can evaluate how well algorithms compress over a 
> >> single
> >>> segment: it's just too small.
> >>>
> >>>> As soon as we get enough compressed data for FEC, we can insert them.
> >>>> This would be a great preformance improvement for large file on SMP.
> >>>>
> >>> That would involve rewritting most of the client-layer.
> >>>
> >>>> It this doable without changing the data format?
> >>>>
> >>> It's not about the data format; we insert the manifest at the end unless 
not
> >>> told to by the earlyEncode parameter.
> >>>
> >>> IMHO we are debating for no real reason here: the real-time taken by the
> >>> compression phase is insignificant compared to the time taken by the
> >>> insertion process. Sure, trunk will take at least 3 times longer than 
> >> current
> >>> stable before it starts inserting anything; but is that a big deal? You 
will
> >>> need real numbers to convince me here.
> >> I'd like some numbers ... iirc it takes around 2 days to insert a 
CD-sized 
> >> ISO? How long does it take to bzip2 it?
> >>
> > 
> > It obviously depends on various factors including how fast you can do 
> > I/Os, the block size and the number of cores you have.
> > 
> > Here on what is likely to be "the worst case scenario":
> > $time bzip2 -c iso > iso.bz2|grep real
> > real 3m57552s
> > $time gzip -c iso > iso.gz|grep real
> > real 0m46.079s
> > $du -hs iso*
> > 560M iso
> > 506M iso.bz2
> > 506M iso.gz
> > 
> > There is no clear gain to bzip the content... but compression is worth 
> > it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! 
> > Now if you tell me that freenet is able to insert 108MB of data in less 
> > than 5mins, I will consider optimizing the compression step.
> > 
> > They are solutions for guesstimating the efficiency of a given 
> > compression algorithm but I am not sure they are worth implementing.
> > 
> 
> Here is some more representative data on a dual-core system:
> 
> real    24m55.472s
> user    23m4.947s
> sys     0m10.633s
> 1884544 iso.lzma
> 
> real    13m32.442s
> user    12m6.937s
> sys     0m7.784s
> 1934324 iso.bz2
> My implementation of BZIP2 uses only one of the two cores
> 
> real    3m19.066s
> user    2m11.332s
> sys     0m6.284s
> 1935056 iso.gz
> 
> And the original :
> 2026416 iso
> 
> So, we have:
>       63325 blocks for the original
>       60470 blocks with GZIP (4.5% gain)
>       60447 blocks with BZIP2 (4.5% gain)
>       58892 blocks with LZMA (7% gain)
> 
> Of course those don't include the FEC blocks: So to sum up, yes I think 
> it's worth spending half an hour of CPU time to "win" 4433*2=8866 
> blocks. And that's still true on a single core system where we would 
> spend 1 hour.

Okay, then the current trunk code is fine. Lzma would be great, if you can 
solve the DoS issues; we'd probably use -5, I'm definitely not comfortable 
with -7:

      -1          2 MB               1 MB
     -2         12 MB               2 MB
     -3         12 MB               1 MB
     -4         16 MB               2 MB
     -5         26 MB               3 MB
     -6         45 MB               5 MB
     -7         83 MB               9 MB
     -8        159 MB              17 MB
     -9        311 MB              33 MB

Eventually we should show a progress bar within compression. In the short 
term, it would be reasonably easy for fproxy to show "Compressing" when it is 
compressing, instead of just having no progress bar. After that it should 
move to "Starting", and after that show a progress bar. If you want me to 
deal with that I'll get around to it eventually; should I file a bug? The UI 
changes are needed for 0.8 but not for 1166.

Attachment: pgpfZRqBnWIYE.pgp
Description: PGP signature

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to