Re: openbsd clusters

Nick Holland Fri, 28 Dec 2012 04:46:04 -0800

On 12/27/12 17:25, Jiri B wrote:
> On Wed, Dec 26, 2012 at 03:26:43PM -0500, Nick Holland wrote:
>> Probably thinking of this thread:
>> http://marc.info/?t=117689108200011&r=1&w=2
>> and my two contributions to it.  A number of other people provided some
>> good (and some bad) comments, too...read through 'em all.  You get to
>> decide which are useful and which are not, and what is right and what is
>> wrong.
>> 
>> Keep in mind that thread is almost six years old...500GB was a big disk
>> back then.  However, I'm still quite proud of that system.
>> (and in case you were wondering, my employment ended with that employer
>> about four months later.  That also makes a great story, but quite
>> off-topic.  They did replace my system with a proprietary system that
>> cost many times as much).
> 
> Only setup I can imagine which cannot fit into this setup of small
> partitions combined with filesystem structure and symlinks is this one
> 
>    'unrestricted space offered directly to a user via ftp/sftp/ssh'
> 
> As we cannot predict how fast and when he/she would fit the storage,
> moving later user's whole data to bigger one is slow and still not
> a solution.
> 
> It seems to me that giving a user direct access to his data root dir
> while telling him about no space restriction is not possible.


I would say that's true, period.  Fancy stuff only lets you push off the
problem to a bigger number, but you always have some finite storage
available, and if given no limits, no checks, no costs, you WILL fill it
eventually...unless you have an inbound pipe that's slower than your
procurement process for new storage (and I'm going to argue, that's
cheating! :)

If your task definition is "give a user direct access to unlimited
storage", well, yes... I may not have the greatest solution in the world
for you...but then, you crafted the question in a non-business savvy way
to stump me (me: "you don't need unlimited storage for most real world
tasks"  you: "My real world task is to give someone unlimited storage")
-- you are ignoring all laws of economics, and your solution WILL have
serious issues because of that (why do we have a problem with spam?
Because it's painless and risk-free for the sender.  Why are we seeing a
resurgence in telephone-based scams?  Because it's become painless and
risk-free for the scammer.  Why will your task blow up in your face in
predictable ways?  Because there's no cost to the consumer of your disk
space.  Econ 101).

But still...this is not a statement of an actual problem to be solved
("I need to be able to upload lots of huge video files for exchange with
other people"), but a proposed solution ("unlimited direct access to
file systems").  So I'm not going to admit defeat. :)

> 
> On the other hand, if the user would not require one big directory for
> his data, then filesystem layout could be hidden to the user and mentioned
> setup would fit - although instead of direct ftp/sftp the user would use
> some specialized client to get his files, the setup would use some UUID and
> keep track of UUID and his owner (or something similar).
> 
> Any comments? Do exists some "proxies" which would mirror files immediately
> when a user is uploading them via some common protocol? And when the user
> deletes some of his files the "proxy" would delete the copy? (rsyncing
> later regularly could be quite problematic if you would have many users
> uploading for example a couple of GB files...).

actually, rsyncing is fantastic for huge files...it can verify quickly
and sync at hardware's capability for mismatches.  Lots of small files,
you start having file system overhead.

If you look at some of the Big File Sharing Services, I think you will
find this "problem" has been solved....and considering the fact that
many of them offer some service for "free", or at least a fraction of
the price per gigabyte that many high-end solutions give you, I think it
is safe to say it is NOT being done with high-end SANs, but cheap
commodity hw and disks (and low maintenance solutions, too).

Realistically, you will have upload limits.  2GB is an upload limit
above which, http starts having issues and some file systems start
having issues (note: USB devices are still often formatted with
variations of FAT file systems, which have a 2GB limit).

So..you let people upload to a "temp" area...if you accept 2GB as an
upload limit, a 500GB upload area would cover a fair number of uploads.
 If you want 100GB upload limit, well...500GB will fill rapidly, but you
can have a lot of these "temp" areas, and a 2TB file system isn't so
crazy anymore.  Your user uploads to this area, the received file name
is uniquely generated and tracked by a database.  When uploads are
complete, you give the user some kind of "key" to identify THEIR file
(maybe just the original name, when combined with their user ID), and
the database tracks it.  After the upload is complete, the system
identifies the size of the file, and looks around in its storage chunks
for a place to put it, and slowly (to not tax the disk I/O) copies it to
that location, maybe again to a backup location on a different physical
device, SHA256's the original file and the two new copies, updates the
database with the new, "permanent" locations, and purges the file from
the "temp" upload area.  Note that the file will remain available at
every step of the process after the original upload -- if the first
download request is made before the "move" is complete, it is served
from the temp area.  As new storage is bolted on, files can be (slowly)
pushed around from old storage to new storage to allow pathetic looking
old storage to be shut down in favor of shiny new storage.  Files can be
pushed around on existing storage to better optimize the available space
in predefined chunks, too.

Yes, this is starting to get complex, but it is a complexity of simple,
understood tools, which I have more faith in than AFS, which seems to be
understood by few, other than "whatever you want to do, AFS probably
does it.  I think.  dunno, never actually used it".  It does appear that
a successful AFS implementation does require a fair amount of
planning...oh, there's that concept again...


Funny thing...  I actually DO have an app which works out very nicely
with ZFS, and not so well with traditional file systems for one silly
reason.

The app: a disk-to-disk backup system
Each "target" machine (system being backed up) gets its own ZFS file
system in a big backup pool. rsync is used to backup the remote machines
to this ZFS file system, then after a backup, a ZFS snapshot is made,
and the snapshot is sent over to a second machine.  Old snapshots are
deleted after the designated number of backups.

Why ZFS works better than a standard file system: The ability to bolt on
additional storage units is nice, though practically speaking, all the
storage this system will ever have is in the box already, I just add 2TB
chunks as needed, so I really don't count this as a huge win.  The big
win?  "df -h" tells me quickly which backup is chewing the most space,
so if some machine is backing up hundreds of GB every night but should
be mostly static data, I can go after that machine and figure out what
is wrong.  "du -s" is how I'd have to do this on a traditional file
system, and that's very slow when you have tens of thousands of files.
(the ZFS snapshots look like a win, but I've used the rsync --link-dest
option to get the same effect.  A ZFS knowledgeable coworker of mine
argued that the ZFS snapshotting would be more efficient than the
--link-dest option, but I'm not seeing it -- the size of the snapshot
being pushed between machines is much bigger than I'd expect often,
often much bigger than the incremental rsync transfer.  Ok, the ability
to push the snapshot between machines is cool, but considering how fast
these backups usually are, one could just run full rsync backups twice
very easily, and sending ZFS snapshots between machines has some
seriously odd quirks).

downside: well, this backup system seems to wedge its FreeBSD-based ZFS
very often -- once managed an uptime of almost 50 days, but that was an
extremely successful run.  ZFS is quirky as hell; even if it were
BSD-licenced, I don't think it meets OpenBSD's standards of "just
works"...it's cool, it's nifty, it sets back the computer industry about
20 years in terms of twisting knobs to get basic functionality... and it
generates mysterious errors that the official fix is "rebuild your file
system and restore from backup"... um...this IS the backup!

At home, my d2d backup system is running OpenBSD, with finite partitions
and I'm not about to change that.  It runs from [upgrade|power event] to
[upgrade|power event], on old hardware and big disks.

Really, getting quite off-topic here.  If you want to have a buzzword
compliant system, go for it.  It will probably move you further ahead in
the IT business than my conservative approaches will move me.  It isn't
about what works best, it's about what you can add to your resume for
your next job (and when I learn to live by this rule, I'll be a much
happier person).

Nick.

Re: openbsd clusters

Reply via email to