* Daniel Cheng <[EMAIL PROTECTED]> [2008-10-24 06:44:38]:

> On Thu, Oct 23, 2008 at 9:51 PM, Florent Daigniere
> <[EMAIL PROTECTED]> wrote:
> > Daniel Cheng wrote:
> >> On Thu, Oct 23, 2008 at 9:02 PM, Florent Daigniere
> >> <[EMAIL PROTECTED]> wrote:
> >>> Florent Daigniere wrote:
> >>>> Matthew Toseland wrote:
> >>>>> On Thursday 23 October 2008 10:39, NextGen$ wrote:
> >>>>>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]:
> >>>>>>
> >>>>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]>
> >>>>> wrote:
> >>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]:
> >>>>>>>>
> >>>>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
> >>>>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]:
> >>>>>>>>>>
> >>>>>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED]
> >>>>> wrote:
> >>>>>>>>>>>> Author: nextgens
> >>>>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
> >>>>>>>>>>>> New Revision: 23014
> >>>>>>>>>>>>
> >>>>>>>>>>>> Modified:
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
> >>>>>>>>>>>>
> >>>>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
> >>>>>>>>>>>>
> >>>>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
> >>>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
> >>>>>>>>>>>>
> >>>>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
> >>>>>>>>>>>> Log:
> >>>>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
> >>>>>>>>>>>> It's still not backward compatible with stable but should be
> >>>>>>>>>>> forward-compatible ;)
> >>>>>>>>>> [...] see r23023
> >>>>>>>>>>
> >>>>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip 
> >>>>>>>>>>> now?
> >>>>>>>>> Shouldn't
> >>>>>>>>>>> there be a max size configuration above which we don't try bzip2,
> >>>>> perhaps
> >>>>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long
> >>>>> time ...
> >>>>>>>>>> I don't think we need one. Big files will take long to compress but
> >>>>> will
> >>>>>>>>> take
> >>>>>>>>>> long to insert too. I think it's worth spending a few more CPU 
> >>>>>>>>>> cycles
> >>>>> to
> >>>>>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
> >>>>>>>>> I'm not convinced that this is acceptable from a usability point of
> >>>>> view.
> >>>>>>>>> Maybe we can provide a progress bar within the compression phase? On
> >>>>> the new
> >>>>>>>>> UI it is proposed to separate downloads which are not yet finalised
> >>>>> (i.e.
> >>>>>>>>> haven't fetched the last lot of metadata) from downloads that 
> >>>>>>>>> are... we
> >>>>> could
> >>>>>>>>> do something similar with inserts in compression.
> >>>>>>>>>
> >>>>>>>> Have a look to what I have commited. From now on the compression is
> >>>>> fully
> >>>>>>>> serialized... We have one mutex, and only one compression job (just 
> >>>>>>>> like
> >>>>> we
> >>>>>>>> do for FEC encoding in fact) which means a even higher latency.
> >>>>>>> It is feasible to insert some blocks of data while compressing?
> >>>>>>> Gzip, bzip2 and lzma all support streams. We can collect the output 
> >>>>>>> data
> >>>>>>> as we feed data to them.
> >>>>>>>
> >>>>>> Right now we attempt to compress the full data using all the 
> >>>>>> compression
> >>>>>> algorithms and we keep the smallest resulting bucket. How do you plan 
> >>>>>> to
> >>>>>> chose the best-performing algorithm before actually compressing the 
> >>>>>> data?
> >>>>>>
> >>>>>> I don't think that we can evaluate how well algorithms compress over a
> >>>>> single
> >>>>>> segment: it's just too small.
> >>>>>>
> >>>>>>> As soon as we get enough compressed data for FEC, we can insert them.
> >>>>>>> This would be a great preformance improvement for large file on SMP.
> >>>>>>>
> >>>>>> That would involve rewritting most of the client-layer.
> >>>>>>
> >>>>>>> It this doable without changing the data format?
> >>>>>>>
> >>>>>> It's not about the data format; we insert the manifest at the end 
> >>>>>> unless not
> >>>>>> told to by the earlyEncode parameter.
> >>>>>>
> >>>>>> IMHO we are debating for no real reason here: the real-time taken by 
> >>>>>> the
> >>>>>> compression phase is insignificant compared to the time taken by the
> >>>>>> insertion process. Sure, trunk will take at least 3 times longer than
> >>>>> current
> >>>>>> stable before it starts inserting anything; but is that a big deal? 
> >>>>>> You will
> >>>>>> need real numbers to convince me here.
> >>>>> I'd like some numbers ... iirc it takes around 2 days to insert a 
> >>>>> CD-sized
> >>>>> ISO? How long does it take to bzip2 it?
> >>>>>
> >>>> It obviously depends on various factors including how fast you can do
> >>>> I/Os, the block size and the number of cores you have.
> >>>>
> >>>> Here on what is likely to be "the worst case scenario":
> >>>> $time bzip2 -c iso > iso.bz2|grep real
> >>>> real 3m57552s
> >>>> $time gzip -c iso > iso.gz|grep real
> >>>> real 0m46.079s
> >>>> $du -hs iso*
> >>>> 560M iso
> >>>> 506M iso.bz2
> >>>> 506M iso.gz
> >>>>
> >>>> There is no clear gain to bzip the content... but compression is worth
> >>>> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)!
> >>>> Now if you tell me that freenet is able to insert 108MB of data in less
> >>>> than 5mins, I will consider optimizing the compression step.
> >>>>
> >>>> They are solutions for guesstimating the efficiency of a given
> >>>> compression algorithm but I am not sure they are worth implementing.
> >>>>
> >>> Here is some more representative data on a dual-core system:
> >>>
> >>> real    24m55.472s
> >>> user    23m4.947s
> >>> sys     0m10.633s
> >>> 1884544 iso.lzma
> >>>
> >>> real    13m32.442s
> >>> user    12m6.937s
> >>> sys     0m7.784s
> >>> 1934324 iso.bz2
> >>> My implementation of BZIP2 uses only one of the two cores
> >>>
> >>> real    3m19.066s
> >>> user    2m11.332s
> >>> sys     0m6.284s
> >>> 1935056 iso.gz
> >>>
> >>> And the original :
> >>> 2026416 iso
> >>>
> >>> So, we have:
> >>>        63325 blocks for the original
> >>>        60470 blocks with GZIP (4.5% gain)
> >>>        60447 blocks with BZIP2 (4.5% gain)
> >>>        58892 blocks with LZMA (7% gain)
> >>
> >> The compression rate on "ISO" is not realistic. ISO is an uncompressed 
> >> format.
> >> The actual compression rate depends on the data on the CD / DVD disk.
> >>
> >
> > I think that the data I used is representative of what the real users
> > are likely to be dealing with. Bring your own stats if you are not happy
> > with mine.
> 
> The ISO file is Office 2007 DVD (Part number SKU-79G-00069, you can
> reproduce yourself)
> 
> Uncompressed:  1032732672 bytes
> bzip2 -9 (best compression)       : 996222481 bytes
> gzip   -6 (*default* compression) : 994112100 bytes
> lzma  -7 (*default* compression) : 990592948 bytes
> 

It's still achieving a 3.5% compression ratio in the worst case
scenario!

That's a *lot* of blocks.

> See how it backfire with bzip2 :)

What's your point? Some of the content might already be bzip2 encoded;
that would explain the poor performance of the algorithm... In any case
we are trying all of them ;)

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to