I'm looking for some high level information about the usefulness of ceph to
a particular use case and, assuming it's considered a good choice, whether
the migration path I have in mind has any particular gotchas that I should
be on the look out for.

The current situation is that I've inherited responsibility for a set of
large-ish file servers, each having a filesystem between 80TB and 130TB (20
disks each of varying sizes).  It's probably not important, but for
completeness, the volumes vary from file server to file server; some are
ZFS pools, others are MD/LVM2 software RAIDs.   The file servers are using
NFS to share data among a set of other servers that need access.   In
addition to the block duplication provided by the volumes, data is also
duplicated between multiple file servers.

I want to replace the setup with ceph, or something like it, for several
reasons:  more efficient use of space (minimum 4x duplication from the
combination RAID and cross-chassis copies (worse for raidz2) could be
reduced), more complete use of disks (datasets are large and not easily
split, so a dataset can't be moved to a file server unless there's space,
so available storage is inefficiently used), more predictable access (a
single CephFS mount with a single copy of each dataset, rather than trying
to figure out which copy of a dataset to use from which NFS mount).. the
list goes on.

So that's what I'm working with.. on to the questions..

First, in my tests and reading I haven't encountered anything that suggests
I should expect problems from using a small number of large file servers in
a cluster.  But I recognize that this isn't the preferred configuration,
and I'm wondering if I should be worried about any operational issues.
Obviously I/O won't be as good as it could be, but I don't expect it would
be worse than software RAID served over NFS.  Is there anything there I've
missed?  Eventually the plan will be to swap out the large file servers for
a larger number of smaller servers, but that will take years of regular
hardware refresh cycles.

Second, in order to migrate the current setup to Ceph I'd need to vacate a
file server, convert it over and create the first OSDs, move data onto the
new Ceph filesystem vacating the next file server, convert that and add it
to the cluster, and so on like dominos until they're all converted over and
joined to the cluster.  Is there any problem with this migration plan?  I
think the only thing I'm not clear on is whether Ceph will automatically do
cross-chassis block duplication and rebalance data as I convert file
servers over and add on new OSDs. Is there any problem there?  Anything
else I need to watch out for?

Thanks for any comments.  I've tried to sort out as much of this as I can
in testing, but I can't lab this at the scale of the eventual production
deployment because of the capital cost involved, and I recognize that
there's a chance there are nuances I won't spot until it's too late.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to