MIkey wrote:
The second step is to ditch storing everything on a single 9TB system that
cannot be backed up efficiently.  Distribute the storage of the images on
clusters or whatever.  For example peel of 1TB of images onto a single
server, then update the database (or apache/squid mapping) to point to the
new location.  9 1TB boxes would be far less prone to catastrophic failure
and much easier to replicate/mirror/backup than a single 9TB box.  This is
what I call the "google approach" ;)  Use cheap commodity hardware and
smart implementation to distribute/scale the load.

Of course the ultimate solution would some sort of cluster or san
approach...


I'm not sold on the Google approach.

Assuming someone was to build nine data servers we're talking roughly $3k per server (dual CPU, 4GB ram, raid 5 sata) or $30k with shipping and tax. On top of that I now have to manage nine boxes and manage my data in nine different places. These 9 servers are going to pull 18A of power and uses 18U of rack space. Whereas $35k gets me an NFS/iSCSI/cifs head (of admittedly third tier storage) and two 16 x 500GB shelves or 12TB usable if I split each shelf into two RAID 6 partitions. This setup pulls 14A, uses 8U, has volume management, snapshots, can expand easily, and eventually cluster the heads if I'm willing to buy the license later. The 9 x 1TB setup might be worth the pain if you had the application written to deal with that and needed more of your data in RAM. For a community photo site I'm not sure you do. Additionally I don't think it helps solve the original problem of backing data up somewhere, but maybe I'm missing something.

In any case I'm not saying you need to spend $30k to fix the problem, but if you plan to drop some money on the problem really sit down a figure initial cost, cost to expand, rack space, power, cooling, maintenance costs, administration costs, etc and relate it all back to a $/GB so you can compare apples to apples.


In order to get better backups you might consider hashing your data a bit more on the filesystem.

What you've got now
/data/00000-50000/file|thumb|etc

what might work better
/data/1e/01ac/cdd98a910ca1d4e37b39a9197e/file|thumb|etc

And then you can run through each tree and only sync the subdirs you need. I'm not certain this idea is the right way to go long term, but might be easy to implement now. I would not use more than three layers of directories.

kashani
--
[email protected] mailing list

Reply via email to