Hi Carsten,

In your example the only thing that seems to matter to you is *collecting* data speed, in short the realtime compression speed that tapestreamers can get, to give one
example.

In your example you need to compress each time stuff.

That's not being realistic however. I'll give you 2 arguments.

In an ideal situation you compress things only a single time. So decompression speed matters if you want to realtime lookup data. It would be far more ideal in your situation to already have things very well compressed at your drive, doing some realtime compression/decompression is not very useful then, the hardware compression of those tape streamers already is doing some simplistico
Runlength encoding usually.

Now another lack is that you assume stuff that's on your 10TB array is never getting used by not a single user
over the network.

Vincent




On Oct 3, 2008, at 1:49 PM, Carsten Aulbert wrote:

Hi Vincent,

Vincent Diepeveen wrote:
Ah you googled 2 seconds and found some oldie homepage.

Actually no, I just looked at my freshmeat watchlist of items still to
look at :)


Look especially at compressed sizes and decompression times.

Yeah, I'm currently looking at
http://www.maximumcompression.com/data/summary_mf3.php

We have a Gbit network, i.e. for us this test is a null test, since it
takes 7-zip close to 5 minutes to compress the data set of 311 MB which we could blow over the network in less than 5 seconds, i.e. in this case
tar would be our favorite ;)


The only thing you want to limit over your network is the amount of
bandwidth over your network.
A real good compression is very helpful then. How long compression time
takes is nearly not relevant,
as long as it doesn't take infinite amounts of time (i remember a new
zealand compressor which took 24
hours to compress a 100MB data). Note that we are already at a phase
that compression time hardly
matters, you can buy a GPU for that to offload your servers for that.


No, quite on the contrary. I would like to use a compressor within a
pipe to increase the throughput over the network, i.e. to get around the
~ 120 MB/s limit.

Query time (so decompression time) is important though.


No for us that number is at least as important than the compression time.

Imagine the following situation:

We have a file server with close to 10 TB of data on it in nice chunks
with a since of about 100 MB each. We buy a new server with new disks
and the new one can now hold 20 TB and we would like to copy the data
over. So for us the more important figure is the
compression/decompression speed which should be >> 100 MB/s on our systems.

If 7-zip can only compress data at a rate of less than say 5 MB/s (input data) I can much much faster copy the data over uncompressed regardless of how many unused cores I have in the system. Exactly for these cases I would like to use all cores available to compress the data fast in order
to increase the throughput.

Do I miss something vital?

Cheers

Carsten


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to