MIkey wrote:
The second step is to ditch storing everything on a single 9TB system that
cannot be backed up efficiently. Distribute the storage of the images on
clusters or whatever. For example peel of 1TB of images onto a single
server, then update the database (or apache/squid mapping) to point to the
new location. 9 1TB boxes would be far less prone to catastrophic failure
and much easier to replicate/mirror/backup than a single 9TB box. This is
what I call the "google approach" ;) Use cheap commodity hardware and
smart implementation to distribute/scale the load.
Of course the ultimate solution would some sort of cluster or san
approach...
I'm not sold on the Google approach.
Assuming someone was to build nine data servers we're talking roughly
$3k per server (dual CPU, 4GB ram, raid 5 sata) or $30k with shipping
and tax. On top of that I now have to manage nine boxes and manage my
data in nine different places. These 9 servers are going to pull 18A of
power and uses 18U of rack space. Whereas $35k gets me an NFS/iSCSI/cifs
head (of admittedly third tier storage) and two 16 x 500GB shelves or
12TB usable if I split each shelf into two RAID 6 partitions. This setup
pulls 14A, uses 8U, has volume management, snapshots, can expand easily,
and eventually cluster the heads if I'm willing to buy the license later.
The 9 x 1TB setup might be worth the pain if you had the application
written to deal with that and needed more of your data in RAM. For a
community photo site I'm not sure you do. Additionally I don't think it
helps solve the original problem of backing data up somewhere, but maybe
I'm missing something.
In any case I'm not saying you need to spend $30k to fix the problem,
but if you plan to drop some money on the problem really sit down a
figure initial cost, cost to expand, rack space, power, cooling,
maintenance costs, administration costs, etc and relate it all back to a
$/GB so you can compare apples to apples.
In order to get better backups you might consider hashing your data a
bit more on the filesystem.
What you've got now
/data/00000-50000/file|thumb|etc
what might work better
/data/1e/01ac/cdd98a910ca1d4e37b39a9197e/file|thumb|etc
And then you can run through each tree and only sync the subdirs you
need. I'm not certain this idea is the right way to go long term, but
might be easy to implement now. I would not use more than three layers
of directories.
kashani
--
[email protected] mailing list