On Thu, Oct 23, 2008 at 9:51 PM, Florent Daigniere
<[EMAIL PROTECTED]> wrote:
> Daniel Cheng wrote:
>> On Thu, Oct 23, 2008 at 9:02 PM, Florent Daigniere
>> <[EMAIL PROTECTED]> wrote:
>>> Florent Daigniere wrote:
>>>> Matthew Toseland wrote:
>>>>> On Thursday 23 October 2008 10:39, NextGen$ wrote:
>>>>>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]:
>>>>>>
>>>>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]:
>>>>>>>>
>>>>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
>>>>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]:
>>>>>>>>>>
>>>>>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED]
>>>>> wrote:
>>>>>>>>>>>> Author: nextgens
>>>>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
>>>>>>>>>>>> New Revision: 23014
>>>>>>>>>>>>
>>>>>>>>>>>> Modified:
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
>>>>>>>>>>>>
>>>>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
>>>>>>>>>>>>
>>>>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
>>>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
>>>>>>>>>>>>
>>>>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
>>>>>>>>>>>> Log:
>>>>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
>>>>>>>>>>>> It's still not backward compatible with stable but should be
>>>>>>>>>>> forward-compatible ;)
>>>>>>>>>> [...] see r23023
>>>>>>>>>>
>>>>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now?
>>>>>>>>> Shouldn't
>>>>>>>>>>> there be a max size configuration above which we don't try bzip2,
>>>>> perhaps
>>>>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long
>>>>> time ...
>>>>>>>>>> I don't think we need one. Big files will take long to compress but
>>>>> will
>>>>>>>>> take
>>>>>>>>>> long to insert too. I think it's worth spending a few more CPU cycles
>>>>> to
>>>>>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
>>>>>>>>> I'm not convinced that this is acceptable from a usability point of
>>>>> view.
>>>>>>>>> Maybe we can provide a progress bar within the compression phase? On
>>>>> the new
>>>>>>>>> UI it is proposed to separate downloads which are not yet finalised
>>>>> (i.e.
>>>>>>>>> haven't fetched the last lot of metadata) from downloads that are... 
>>>>>>>>> we
>>>>> could
>>>>>>>>> do something similar with inserts in compression.
>>>>>>>>>
>>>>>>>> Have a look to what I have commited. From now on the compression is
>>>>> fully
>>>>>>>> serialized... We have one mutex, and only one compression job (just 
>>>>>>>> like
>>>>> we
>>>>>>>> do for FEC encoding in fact) which means a even higher latency.
>>>>>>> It is feasible to insert some blocks of data while compressing?
>>>>>>> Gzip, bzip2 and lzma all support streams. We can collect the output data
>>>>>>> as we feed data to them.
>>>>>>>
>>>>>> Right now we attempt to compress the full data using all the compression
>>>>>> algorithms and we keep the smallest resulting bucket. How do you plan to
>>>>>> chose the best-performing algorithm before actually compressing the data?
>>>>>>
>>>>>> I don't think that we can evaluate how well algorithms compress over a
>>>>> single
>>>>>> segment: it's just too small.
>>>>>>
>>>>>>> As soon as we get enough compressed data for FEC, we can insert them.
>>>>>>> This would be a great preformance improvement for large file on SMP.
>>>>>>>
>>>>>> That would involve rewritting most of the client-layer.
>>>>>>
>>>>>>> It this doable without changing the data format?
>>>>>>>
>>>>>> It's not about the data format; we insert the manifest at the end unless 
>>>>>> not
>>>>>> told to by the earlyEncode parameter.
>>>>>>
>>>>>> IMHO we are debating for no real reason here: the real-time taken by the
>>>>>> compression phase is insignificant compared to the time taken by the
>>>>>> insertion process. Sure, trunk will take at least 3 times longer than
>>>>> current
>>>>>> stable before it starts inserting anything; but is that a big deal? You 
>>>>>> will
>>>>>> need real numbers to convince me here.
>>>>> I'd like some numbers ... iirc it takes around 2 days to insert a CD-sized
>>>>> ISO? How long does it take to bzip2 it?
>>>>>
>>>> It obviously depends on various factors including how fast you can do
>>>> I/Os, the block size and the number of cores you have.
>>>>
>>>> Here on what is likely to be "the worst case scenario":
>>>> $time bzip2 -c iso > iso.bz2|grep real
>>>> real 3m57552s
>>>> $time gzip -c iso > iso.gz|grep real
>>>> real 0m46.079s
>>>> $du -hs iso*
>>>> 560M iso
>>>> 506M iso.bz2
>>>> 506M iso.gz
>>>>
>>>> There is no clear gain to bzip the content... but compression is worth
>>>> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)!
>>>> Now if you tell me that freenet is able to insert 108MB of data in less
>>>> than 5mins, I will consider optimizing the compression step.
>>>>
>>>> They are solutions for guesstimating the efficiency of a given
>>>> compression algorithm but I am not sure they are worth implementing.
>>>>
>>> Here is some more representative data on a dual-core system:
>>>
>>> real    24m55.472s
>>> user    23m4.947s
>>> sys     0m10.633s
>>> 1884544 iso.lzma
>>>
>>> real    13m32.442s
>>> user    12m6.937s
>>> sys     0m7.784s
>>> 1934324 iso.bz2
>>> My implementation of BZIP2 uses only one of the two cores
>>>
>>> real    3m19.066s
>>> user    2m11.332s
>>> sys     0m6.284s
>>> 1935056 iso.gz
>>>
>>> And the original :
>>> 2026416 iso
>>>
>>> So, we have:
>>>        63325 blocks for the original
>>>        60470 blocks with GZIP (4.5% gain)
>>>        60447 blocks with BZIP2 (4.5% gain)
>>>        58892 blocks with LZMA (7% gain)
>>
>> The compression rate on "ISO" is not realistic. ISO is an uncompressed 
>> format.
>> The actual compression rate depends on the data on the CD / DVD disk.
>>
>
> I think that the data I used is representative of what the real users
> are likely to be dealing with. Bring your own stats if you are not happy
> with mine.

The ISO file is Office 2007 DVD (Part number SKU-79G-00069, you can
reproduce yourself)

Uncompressed:  1032732672 bytes
bzip2 -9 (best compression)       : 996222481 bytes
gzip   -6 (*default* compression) : 994112100 bytes
lzma  -7 (*default* compression) : 990592948 bytes

See how it backfire with bzip2 :)
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to