Hi Carsten,
In your example the only thing that seems to matter to you is
*collecting* data speed,
in short the realtime compression speed that tapestreamers can get,
to give one
example.
In your example you need to compress each time stuff.
That's not being realistic however. I'll give you 2 arguments.
In an ideal situation you compress things only a single time. So
decompression speed
matters if you want to realtime lookup data. It would be far more
ideal in your situation to
already have things very well compressed at your drive, doing some
realtime compression/decompression
is not very useful then, the hardware compression of those tape
streamers already is doing some simplistico
Runlength encoding usually.
Now another lack is that you assume stuff that's on your 10TB array
is never getting used by not a single user
over the network.
Vincent
On Oct 3, 2008, at 1:49 PM, Carsten Aulbert wrote:
Hi Vincent,
Vincent Diepeveen wrote:
Ah you googled 2 seconds and found some oldie homepage.
Actually no, I just looked at my freshmeat watchlist of items still to
look at :)
Look especially at compressed sizes and decompression times.
Yeah, I'm currently looking at
http://www.maximumcompression.com/data/summary_mf3.php
We have a Gbit network, i.e. for us this test is a null test, since it
takes 7-zip close to 5 minutes to compress the data set of 311 MB
which
we could blow over the network in less than 5 seconds, i.e. in this
case
tar would be our favorite ;)
The only thing you want to limit over your network is the amount of
bandwidth over your network.
A real good compression is very helpful then. How long compression
time
takes is nearly not relevant,
as long as it doesn't take infinite amounts of time (i remember a new
zealand compressor which took 24
hours to compress a 100MB data). Note that we are already at a phase
that compression time hardly
matters, you can buy a GPU for that to offload your servers for that.
No, quite on the contrary. I would like to use a compressor within a
pipe to increase the throughput over the network, i.e. to get
around the
~ 120 MB/s limit.
Query time (so decompression time) is important though.
No for us that number is at least as important than the compression
time.
Imagine the following situation:
We have a file server with close to 10 TB of data on it in nice chunks
with a since of about 100 MB each. We buy a new server with new disks
and the new one can now hold 20 TB and we would like to copy the data
over. So for us the more important figure is the
compression/decompression speed which should be >> 100 MB/s on our
systems.
If 7-zip can only compress data at a rate of less than say 5 MB/s
(input
data) I can much much faster copy the data over uncompressed
regardless
of how many unused cores I have in the system. Exactly for these
cases I
would like to use all cores available to compress the data fast in
order
to increase the throughput.
Do I miss something vital?
Cheers
Carsten
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf