Matthew Toseland wrote:
> On Thursday 23 October 2008 10:39, NextGen$ wrote:
>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]:
>>
>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]> 
> wrote:
>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]:
>>>>
>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]:
>>>>>>
>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED] 
> wrote:
>>>>>>>> Author: nextgens
>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
>>>>>>>> New Revision: 23014
>>>>>>>>
>>>>>>>> Modified:
>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
>>>>>>>>    
> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
>>>>>>>>    
> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
>>>>>>>>    
> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
>>>>>>>> Log:
>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
>>>>>>>> It's still not backward compatible with stable but should be
>>>>>>> forward-compatible ;)
>>>>>>
>>>>>> [...] see r23023
>>>>>>
>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now?
>>>>> Shouldn't
>>>>>>> there be a max size configuration above which we don't try bzip2, 
> perhaps
>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long 
> time ...
>>>>>> I don't think we need one. Big files will take long to compress but 
> will
>>>>> take
>>>>>> long to insert too. I think it's worth spending a few more CPU cycles 
> to
>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
>>>>> I'm not convinced that this is acceptable from a usability point of 
> view.
>>>>> Maybe we can provide a progress bar within the compression phase? On 
> the new
>>>>> UI it is proposed to separate downloads which are not yet finalised 
> (i.e.
>>>>> haven't fetched the last lot of metadata) from downloads that are... we 
> could
>>>>> do something similar with inserts in compression.
>>>>>
>>>> Have a look to what I have commited. From now on the compression is 
> fully
>>>> serialized... We have one mutex, and only one compression job (just like 
> we
>>>> do for FEC encoding in fact) which means a even higher latency.
>>> It is feasible to insert some blocks of data while compressing?
>>> Gzip, bzip2 and lzma all support streams. We can collect the output data
>>> as we feed data to them.
>>>
>> Right now we attempt to compress the full data using all the compression
>> algorithms and we keep the smallest resulting bucket. How do you plan to
>> chose the best-performing algorithm before actually compressing the data?
>>
>> I don't think that we can evaluate how well algorithms compress over a 
> single
>> segment: it's just too small.
>>
>>> As soon as we get enough compressed data for FEC, we can insert them.
>>> This would be a great preformance improvement for large file on SMP.
>>>
>> That would involve rewritting most of the client-layer.
>>
>>> It this doable without changing the data format?
>>>
>> It's not about the data format; we insert the manifest at the end unless not
>> told to by the earlyEncode parameter.
>>
>> IMHO we are debating for no real reason here: the real-time taken by the
>> compression phase is insignificant compared to the time taken by the
>> insertion process. Sure, trunk will take at least 3 times longer than 
> current
>> stable before it starts inserting anything; but is that a big deal? You will
>> need real numbers to convince me here.
> 
> I'd like some numbers ... iirc it takes around 2 days to insert a CD-sized 
> ISO? How long does it take to bzip2 it?
> 

It obviously depends on various factors including how fast you can do 
I/Os, the block size and the number of cores you have.

Here on what is likely to be "the worst case scenario":
$time bzip2 -c iso > iso.bz2|grep real
real 3m57552s
$time gzip -c iso > iso.gz|grep real
real 0m46.079s
$du -hs iso*
560M iso
506M iso.bz2
506M iso.gz

There is no clear gain to bzip the content... but compression is worth 
it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! 
Now if you tell me that freenet is able to insert 108MB of data in less 
than 5mins, I will consider optimizing the compression step.

They are solutions for guesstimating the efficiency of a given 
compression algorithm but I am not sure they are worth implementing.

> Also from a usability point of view, having Freenet apparently not doing 
> anything with an insert for hours is *bad*. We will need a compression 
> progress monitor.
> 

I don't think that compression is any slower than FEC encoding which is 
already serialized... and so far we didn't tell the user about it. Fill 
in a feature request on mantis.
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to