/>  Database servers write lots (maybe 10k/day?) of little PDF's to shared 
filesystem; apps servers read'em, print'em, etc./


10k/day averages to one PDF file every ~9 seconds. Even if your server load peaks to 900% of average during business hours, that's still only one PDF file every second or so. Any modern Linux box with a new hard drive and GigE could handle that... if you had a dedicated shared file server running NFS or Samba, I think you'd be fine.

For the mirroring -- assuming regular backups won't do -- consider using DRDB:

http://blog.mydream.com.hk/howto/howto-create-gfs-on-drbd-network-disk-mirroring
http://www.redhat.com/archives/linux-cluster/2007-December/msg00083.html
http://roaksoax.wordpress.com/2008/07/31/installing-drbd-on-hardy/

I've been waiting for cluster project to test it out on. The above links recommend it with GFS, for "dual primary" (similar to "multi master") clusters where two servers can write at the same time. In this case, you'd have true high availability -- the "backup" would always be up and running. You could even load balance between the two running servers...

But since you said you don't need "dual primary", you could also just use ext4 and then mount the DRDB device as a read-only mount on the backup server. Since the backup never writes to the filesystem (unless it was reconfigured to be the master), you wouldn't need to deal with GFS. Ext4 has proven to be a performance fiend... I switched to it a few months ago and I'm very happy with it. Clients can then just use NFS or Samba. (I don't always use NFS. But when I do, I prefer user-mode NFS.)

    Here's a article from 2007 which includes Ext4 performance data:

http://ciar.org/ttk/zfs-xfs-ext4.html


--Derek

On 04/05/2011 07:59 PM, Glenn Stone wrote:
On Tue, Apr 05, 2011 at 07:14:51PM -0700, Alexandr Normuradov wrote:
> From real world experience we used IBMs GPFS. Very stable, has
multiple clients drivers and robust manuals. Cost alot but actually
works.
How much is "a lot"?  he asks curiously...

Also, how good is IBM's support for this?  That was a concern the boss had
(and so do I), that Coda's support would be sketchy at best... I've dealt at
the driver level before but it's been a *long* time...

More info:  Database servers write lots (maybe 10k/day?) of little PDF's to
shared filesystem; apps servers read'em, print'em, etc.  I've not seen
anything indicating anyone has optimized for many small files; much to the
contrary (optimized for database use, few large files).

-- Glenn


On Tuesday, 5 April 2011, Derek Simkowiak<[email protected]>  wrote:
      I looked at CODA for a cluster back in 2004 and decided I'd never use it.

     It's far more complex than any other filesystem I've worked with.  It's the only one that 
requires a special "log" partition, a special "metadata" partition, and 
requires you to enter hex addresses for the starting locations of certain data blocks.

     Consider this paragraph from the CODA manual that tells you how big the 
RVM partition should be:

As a rule of thumb, you will need about 3-5% of the total file data space for 
recoverable storage. We currently use 4% as a good value under most 
circumstances. In our systems the data segment is 90Meg for approximately 3.2 
gigabytes of disk space. By making it smaller, you can reduce server startup 
time. However, if you run out of space on the RVM Data partition, you will be 
forced to reinitialize the system, a costly penalty. So, plan accordingly.

     Okay, so... I've never used CODA before, and I'm not sure what my filesystem will 
look like.  There is no way their ancient example numbers for "3.2 gigabytes" 
scales up to today's filesystems that are closer to 3.2 terabytes.  How am I supposed to 
know what initial values to choose?  If I guess wrong, it'll only destroy the entire 
filesystem.  I can understand inode size... but I can't understand this.

     And configuring the RVM (metadata) looks like this (again from the manual 
-- those hex values are supposed to be magically chosen and entered by the 
user):

$ rdsinit /dev/hdc1 /dev/sdb1

Enter the length of the device /dev/sdb1: 119070700

Going to initialize data file to zero, could take awhile.
done.
rvm_initialize succeeded.

starting address of rvm: 0x50000000
heap len: 0x1000000
static len: 0x100000
nlists: 80
chunksize: 32

rds_zap_heap completed successfully.
rvm_terminate succeeded.

[Note: Use of the decimal value for the length of the device and the use of hex 
values for the address and lengths of the next three values.]

     I absolutely love that little note... use decimal value for the first 
value, and hex for everything else.  Don't forget! :)  And the manual says that 
they use 0x50000000 (or was it 0x500000000?  can't remember) for Intel-based 
architectures running Linux or FreeBSD... but nothing about other platforms.  
The tools, documentation, and skilled technicians necessary in an emergency 
just don't seem to be there for CODA.

     In short, managing CODA seems to be about on par with managing a big 
database.  Too complex, too many options, and you need an in-house expert to 
keep the thing running.


/A big item on our wishlist is the ability to both have multiple hosts writing 
to the distributed filesystem/
      NFS or Samba.  Cross-platform, any error message you come across is 
guaranteed to show up in a Google search, and any version of Linux comes with 
either of these ready to go.

     (There are others, like SSHFS and WebDAV, but they don't support the 
concept of UIDs and GIDs.)


/*and* have read-only backup copies on standby hosts (which could then be 
converted to active in the event of catastrophe)/
      How will this filesystem be used?  Is this for a company file server, or 
for some real-time shared storage for a public server cluster?

     Note that hot mirrors (like CODA or RAID) are only part of the solution.  
They won't protect you if you accidentally delete the wrong file, or if you get 
rooted by a script kiddie.

     I use rsnapshot (which does rsync incrementals like Apple's Time Machine) 
run once every hour, to an offsite backup server.  My backups are always online 
in a read-only fashion, ready for use at any time.  If my primary server melts, 
then I've lost (at most) an hour of work.  Plus, I have incrementals -- I can 
instantly see any of my files as they existed 3 hours, 4 days, or 5 months ago. 
 Using SSH keys it's all encrypted and fully automatic.  And if there's a 
disaster, I'm not dealing with any magic partition sizes or other such nonsense 
-- it's just files on a disk.


--Derek

On 04/05/2011 04:27 PM, Glenn Stone wrote:

$NEWCOMPANY is having major issues with OCFS, and I'm looking into
alternatives for replacing it.  A big item on our wishlist is the ability to
both have multiple hosts writing to the distributed filesystem, *and* have
read-only backup copies on standby hosts (which could then be converted to
active in the event of catastrophe).  Coda seems to fit this bill, from what
I've been able to google up.  I'm not, however, able to determine if this
thing is still in an R&D phase, or ready for prime time; it seems to maybe
kinda slowly still be being worked on? (Latest RPM's are dated 1/26/2010).

Exploring alternatives,
Glenn

Listen: I'm a politician, which means I'm a cheat and a liar, and when I'm
not kissing babies I'm stealing their lollipops. But it also means I keep my
options open.  -- Jeffery Pelt, "Red October"




--
Sincerely,
Alexandr Normuradov
425-522-3703


Reply via email to