Hi Mike

Mike Davis wrote:
Joe,

I don't mean to hijack the thread, but if Dave's users can fit the db's that they are running (Blast for instance against) in /tmp on the compute nodes, overall performance increases.

Yes, I agree with that. I like reminding customers that there is nothing as fast in aggregate as a local file system.

This certainly doesn't work with genbank (unless you have 130+gb of /tmp. But it does work well with nr, uniprot, and the other protein db's.I run a relatively large /tmp filesystems on my nodes (55-100GB). But my nodes are more general

When we build systems that have a large block sequential access (read/write), we focus on building a faster local IO capability. Like ram, compute node disk space is cheap, though for sequential access dominated loads, more spindles is again almost always better.

purpose and may be running blast one day, Gaussian 03 or VASP the next, and Fluent or abaqus after that.

The performance increase will depend on the size of the db, the size of client and server caches, and the number of spindles.

Absolutely.



Mike





Joe Landman wrote:

Mark Hahn wrote:

I would recommend upping the memory. Computing or not, large buffer caches on file servers are with very rare exception, a preferred config.


unclear. the FS's memory does act as an excellent cache, but then again, the client memory does too. do you have a pattern of file accesses in which the same files are frequently re-read and would fit in memory? the servers
I've looked at closely have had mostly write and attribute activity,
since the client's own cache already has a high hit-rate. for writes, of course, more FS memory is not important unless you have extremely high


I was actually assuming read-dominated. Dave does informatics as I remember, and most of the informatics we have dealt with tends to be read dominated. Doesn't mean much though without the workload info though. So I agree with the caution, though I humbly note that a 1GB stick costs about 120$ +/- a bit these days. Eg, it is not a large price, and the potential impact on performance is much higher than for 10k RPM drives.

FWIW I have a pair of 10k RPM SATA raptors and I am not all that impressed with them.

bandwidth net and disks. in fact, I've been using the following sysctl.conf
entries:

# delay writing dirty blocks hoping to collect further writes (default 30s)
vm.dirty_expire_centisecs = 1000
# try writing back every 1s (default 500=5s)
vm.dirty_writeback_centisecs = 100

in short, don't bother working at write caching much. with a lot of memory, an untuned machine will exhibit unpleasant oscillations of delaying writes
then frantically flushing.


Yup. I had my dirty around 250 for a long time. Write caching is harder because if you really want to play it safe, you shouldn't cache the write ...


2Gb/socket minimum. Nothing serves files faster than having them already sitting in ram.


true, but is that actually your working set size? it would be rather embarassing if 3 of the 4 GB were files read once a month...


Hmmm... again, this is a good workload problem. If Dave's users are going through big "databases" from NCBI, lots of ram is a good thing. It it is just a buncha small files, yeah, could be overkill.

But if I had to spend extra $$ on ram versus 10kRPM drives, I know where I would spend it ...


4 x 74 Gb disks Ultra320 (or make an argument for a particular SATA)


SATA disks are SATA disks, of course.  dumb controllers are all pretty
similar as well (cheap, fast, not-cpu-consuming).  if you have your
heart set on HW raid, at least get a 3ware 9550, which is quite fast.
(most other HW raid are surprisingly bad.)


The LSI SAS unit is pretty good. I like the 3ware, the Areca, and a few others. We just created a nice 500+ MB/s "file server" for a large customer out of an Areca card, 16 spindles and some tweaking. I haven't seen production performance data for it yet, but our in house testing exceeded the 500 MB/s by a little bit.

dual 10/100/1000 ethernet on the mobo


Careful on this... we and our customers have been badly bitten by tg3 and broadcom NICs. If the MB doesn't have Intel NICs, get an Intel 1000/MT dual gigabit card. You won't regret that, and it is money well spent.


that's odd; I have quite a few of both tg3 and bcm nics, and can't say I've had any complaints. what are the problems?


Interrupted to death. The tg3 doesn't seem to have NAPI turned on by default in the standard distro kernels. Haven't tried the FC* with this, hopefully it is saner there. Under heavy load, we see interrupts climb past 40k/s, and it context switches like mad. Seen this from early 2.6 through 2.6.13 on SuSE and RHEL. Makes using AOE (Coraid) nearly useless with Broadcom, formatting the unit with ext3 renders the server unusable for hours. Drop a nice Intel unit in there, do the same thing and it works great, server is responsive during formatting. Same issues for file service and heavy load.

Seen this on Tyan, iWill, Arima?, MSI(ibm e32*), and others.


case - 2U (big enough for adequate ventilation, right?)


Yeah, just make sure you have good airflow.


2U still requires a custom PS, doesn't it? it's kind of nice to be able to put in an ATX-ish PS. and is 2U tall enough for stock/standard
heatsink/fans?


Don't know if it is custom. I like the redundant PS, but the small redundant PSes tend not to supply enough current to boot the system. Need a 3U case for that.

Best cooling designs I have seen involve baffles, and a pull or push-pull config. We have used some units where under load the processors are happily working around 22-28C. Fans are loud though. Case (1U) is very cool to the touch.

For 2U you still need to worry about flow. I find it hard to believe that most people get efficient flow out the back grating on 2U and larger without a helper fan of some sort.

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to