On Thu, Oct 23, 2008 at 9:02 PM, Florent Daigniere
<[EMAIL PROTECTED]> wrote:
> Florent Daigniere wrote:
>> Matthew Toseland wrote:
>>> On Thursday 23 October 2008 10:39, NextGen$ wrote:
>>>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]:
>>>>
>>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]>
>>> wrote:
>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]:
>>>>>>
>>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
>>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]:
>>>>>>>>
>>>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED]
>>> wrote:
>>>>>>>>>> Author: nextgens
>>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
>>>>>>>>>> New Revision: 23014
>>>>>>>>>>
>>>>>>>>>> Modified:
>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
>>>>>>>>>>
>>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
>>>>>>>>>>
>>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
>>>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
>>>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
>>>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
>>>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
>>>>>>>>>>
>>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
>>>>>>>>>> Log:
>>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
>>>>>>>>>> It's still not backward compatible with stable but should be
>>>>>>>>> forward-compatible ;)
>>>>>>>> [...] see r23023
>>>>>>>>
>>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now?
>>>>>>> Shouldn't
>>>>>>>>> there be a max size configuration above which we don't try bzip2,
>>> perhaps
>>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long
>>> time ...
>>>>>>>> I don't think we need one. Big files will take long to compress but
>>> will
>>>>>>> take
>>>>>>>> long to insert too. I think it's worth spending a few more CPU cycles
>>> to
>>>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
>>>>>>> I'm not convinced that this is acceptable from a usability point of
>>> view.
>>>>>>> Maybe we can provide a progress bar within the compression phase? On
>>> the new
>>>>>>> UI it is proposed to separate downloads which are not yet finalised
>>> (i.e.
>>>>>>> haven't fetched the last lot of metadata) from downloads that are... we
>>> could
>>>>>>> do something similar with inserts in compression.
>>>>>>>
>>>>>> Have a look to what I have commited. From now on the compression is
>>> fully
>>>>>> serialized... We have one mutex, and only one compression job (just like
>>> we
>>>>>> do for FEC encoding in fact) which means a even higher latency.
>>>>> It is feasible to insert some blocks of data while compressing?
>>>>> Gzip, bzip2 and lzma all support streams. We can collect the output data
>>>>> as we feed data to them.
>>>>>
>>>> Right now we attempt to compress the full data using all the compression
>>>> algorithms and we keep the smallest resulting bucket. How do you plan to
>>>> chose the best-performing algorithm before actually compressing the data?
>>>>
>>>> I don't think that we can evaluate how well algorithms compress over a
>>> single
>>>> segment: it's just too small.
>>>>
>>>>> As soon as we get enough compressed data for FEC, we can insert them.
>>>>> This would be a great preformance improvement for large file on SMP.
>>>>>
>>>> That would involve rewritting most of the client-layer.
>>>>
>>>>> It this doable without changing the data format?
>>>>>
>>>> It's not about the data format; we insert the manifest at the end unless 
>>>> not
>>>> told to by the earlyEncode parameter.
>>>>
>>>> IMHO we are debating for no real reason here: the real-time taken by the
>>>> compression phase is insignificant compared to the time taken by the
>>>> insertion process. Sure, trunk will take at least 3 times longer than
>>> current
>>>> stable before it starts inserting anything; but is that a big deal? You 
>>>> will
>>>> need real numbers to convince me here.
>>> I'd like some numbers ... iirc it takes around 2 days to insert a CD-sized
>>> ISO? How long does it take to bzip2 it?
>>>
>>
>> It obviously depends on various factors including how fast you can do
>> I/Os, the block size and the number of cores you have.
>>
>> Here on what is likely to be "the worst case scenario":
>> $time bzip2 -c iso > iso.bz2|grep real
>> real 3m57552s
>> $time gzip -c iso > iso.gz|grep real
>> real 0m46.079s
>> $du -hs iso*
>> 560M iso
>> 506M iso.bz2
>> 506M iso.gz
>>
>> There is no clear gain to bzip the content... but compression is worth
>> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)!
>> Now if you tell me that freenet is able to insert 108MB of data in less
>> than 5mins, I will consider optimizing the compression step.
>>
>> They are solutions for guesstimating the efficiency of a given
>> compression algorithm but I am not sure they are worth implementing.
>>
>
> Here is some more representative data on a dual-core system:
>
> real    24m55.472s
> user    23m4.947s
> sys     0m10.633s
> 1884544 iso.lzma
>
> real    13m32.442s
> user    12m6.937s
> sys     0m7.784s
> 1934324 iso.bz2
> My implementation of BZIP2 uses only one of the two cores
>
> real    3m19.066s
> user    2m11.332s
> sys     0m6.284s
> 1935056 iso.gz
>
> And the original :
> 2026416 iso
>
> So, we have:
>        63325 blocks for the original
>        60470 blocks with GZIP (4.5% gain)
>        60447 blocks with BZIP2 (4.5% gain)
>        58892 blocks with LZMA (7% gain)

The compression rate on "ISO" is not realistic. ISO is an uncompressed format.
The actual compression rate depends on the data on the CD / DVD disk.

> Of course those don't include the FEC blocks: So to sum up, yes I think
> it's worth spending half an hour of CPU time to "win" 4433*2=8866
> blocks. And that's still true on a single core system where we would
> spend 1 hour.
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to