Re: [Lustre-discuss] Max obtainable performance for a single OST

Andrei Maslennikov Fri, 13 Apr 2007 08:30:14 -0700

Thanks Stephen,

we are looking at a possible peak performance for a single OSS with IB
outlet.
I understand that at the client level the tradeoffs may be visible, and 750
MB/sec
aggregate that you observe is not bad at all. But we want to to ensure that
our OSS is able to unleash around 1 GB/sec into the clients' IB network...



Performance of a single OSS depends on the perfomance of the local ext3
backend
file system, and we were unable to push it over 750 MB/sec. The advice of
Andreas
from Clusterfs is to use 3 OSTs inside one OSS and stripe files over all
three of them.
Time ago we have however considered and discarded this solution as we
wanted
to ensure that every file is confined to one and only one OSS capable to
deliver
0.9-1 GB/sec. Setting the filesystem default stripe count to 3 may lead to a
situation
when a file may end up on different OSS machines, and that's exactly what we
want
to avoid. (I have asked Andreas to comment on the configuration; would it be
possible
to migrate to striping over 3 OST per OSS, and still ensure the OSS
confinement,
we would certainly follow the 3-OST solution).

Greetings - Andrei.

On 4/13/07, Stephen Simms <[EMAIL PROTECTED]> wrote:



Hi Andrei-

750 MB/s or so is about the max that we have seen from a single client to
multiple OSSs across TCP, however we discovered that you can use both
front side busses if you perform two simultaneous writes (turning
ksocklnd's irq affinity off on the server side).  This got us over 1 GB/s
aggregate writes with multiple OSSs on the back end.  Reads have been less
- roughly 400 MB/s and 600 MB/s respectively.  These numbers were using
Myri-10G cards in ethernet mode with DDN 9550 controllers on the back end.
So I believe that front side bus speed and internal memory copies have
prevented us from better single file performance (reads worse than writes
because you can't use zero copy for reads).  My suspicion is that this is
the case for you as well.  Our network performance (measured with netperf)
has been 9.1 Gb/s or better using the Myricom cards in Ethernet mode so we
know that is not the limiting factor.  Likewise, we see better than 350
MB/s per port on the DDN side (using sgpdd) so that's not the limiting
factor either.

I hope this helps,
simms

On Fri, 13 Apr 2007, Andrei Maslennikov wrote:

> We are currently evaluating possible commodity hardware candidates
> suitable for a single OSS with a single OST served to the clients via
the
> IB/RDMA. The goal is to provide the peak performance around 1 GB/sec
> for large streaming I/O for a single file at the client level, *without*
> striping.
> In other words, we want to see if we could build a high performance
> standalone box which would be acting as a Lustre Head for a couple
> of clients (obviously, we will have to run also the metadata service on
it).
>
>
> Economically, the most attractive scenario is to use the
"storage-in-a-box"
> element as it allows to save on FC/SCSI cards and external disk
enclosures.
> One such candidate box that we tried had three RAID-6 controllers, with
> 8 disk modules per controller. The machine is Intel dual core 3 GHz,
with
> 8 GB of RAM. And we are able to get an aggregate disk performance of
> 300+, 600+, 900+ MB/sec on writes if we run 1,2,3 processes against
> 1,2,3 distinct logical drives.
>
> Now comes the interesting point: if we run a single write process
against
> a striped logical volume built upon the three available drives, we only
are
> able to obtain 750 MB/sec. The writer process eats 100% of CPU, and
> there is no way to improve this.  This behaviour, of course, is
perfectly
> normal, but for us this means that if we would base our OST on this
> combination of CPU + striped volume, we probably will never be able to
> spit out more than 750 MB/sec peak i/o perf to the clients. Unless
> the OST backend service itself is multithreaded!
>
> As we do not have at the moment a running Lustre/IB environment to check
> it, I would appreciate if someone could comment on how OST processes are
> organized internally.  If only one thread is doing i/o towards the the
> backend
> ext3 partition, we won't be able to go over 750 MB/sec on such a
machine.
> Otherwise, we could probably grow ip to 900 MB/sec.
>
> Thanks ahead for any comment - Andrei.
>

_______________________________________________
Lustre-discuss mailing list
[EMAIL PROTECTED]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Max obtainable performance for a single OST

Reply via email to