On Wed, 19 Aug 2015, Varada Kari wrote:
> Hi all,
>
> This is regarding generalizing the ceph-disk to work with different osd
> backends like FileStore, KeyValueStore and NewStore etc...
> All these object store implementations has different needs on the disk being
> used for holding data and Meta data.
> Sage suggest in the one of the pull requests for the requirements what
> ceph-disk should satisfy to generalize the ceph-disk to handle all the
> backends optimally. From the current implementation of supported object store
> backends below are requirements ceph-disk is expected to perform.
>
> FileStore:
> 1. Needs a partition/disk for the FileSystem
> 2. Needs a partition/disk for the Journal
> 3. Additionally if we can make the omap(Leveldb/RocksDB to be on separate
> partition depending on the backend medium being used HDD or SSD.
>
> NewStore:
> 1. Needs a File System on a disk/partition
> 2. Optionally needs a file system depending on backend DB
> used(LevelDB/RocksDB ...) for the journal
> 3. Optionally needs a file system on a faster medium for the warm levels to
> hold the data.
>
> KeyValueStore:
> 1. Needs a small partition to hold metadata of the OSD on a file system.
> 2. Needs a partition/disk to hold data. Some backends need a file system,
> some can work of a raw partition/disk.
> 3. Optionally may need a partition to hold the cache or journal
>
> Please add any of the details if I had missed.
>
> Ideally, ceph-disk should make a decision depending on the input given by the
> user through conf file or some options to ceph-disk in a manual deployment.
> Inputs from user can be what kind of file system need to be created, file
> system size, device to be created on etc... in case of a file store.
> Similarly for KeyValueStore, backend can work on raw partition or a disk else
> if would need a file system to work.
>
> Quoting Sage again here.
> Alternatively, we could say that it's the admin's job to express to ceph-disk
> what kind of OSD it should create (backend type, secondary fs's or
> partitions, etc.) instead of inferring that from the environment. In that
> case, we'd could
> * make a generic way to specify which backend to use in the osd_data dir
> * make sure all secondary devices or file systems are symlinked from the
> osd_data dir, the way the journal is today. This could be in a
> backend-specific way. e.g., FileStore wants the journal (to bdev) link,
> NewStore wants a db_wal link (to small + fast fs) link, etc.
> * we could create uuid types for each secondary device type. A raw block dev
> would work just like ceph-disk activate-journal. A new uuid would be for
> secondary fs's, which would mount and then trigger ceph-disk
> activate-slave-fs DEV or similar.
> * ceph-disk activate[-] can ensure that *all symlinks in the data dir resolve
> to real things (all devices or secondary fs's are mounted) before starting
> ceph-osd.
>
> Will be making the changes once we agree on requirements and
> implementation specifics. Please correct me if I had understood wrong.
I think the trick here is to figure out how to describe these
requirements. I think it ought to be some structured thing ceph-osd
can spit out for a given backend that says what it needs. For example,
for filestore,
{
"data": {
"type": "fs",
"min_size": 10485760,
"max_size": 1000000000000000, # whatever
"preferred_size": 100000000000000000,
"required": true
},
"journal": {
"type": "block",
"min_size": 10485760,
"max_size": 104857600,
"preferred_size": 40960000,
"required": false,
"preferred": true
},
}
Then ceph-disk can be fed the devices to use based on those names. e.g.,
ceph-disk prepare objectstore=filestore data=/dev/sda journal=/dev/sdb
Or for your KV backend,
{
"data": {
"type": "fs",
"min_size": 10485760,
"max_size": 10485760,
"preferred_size": 10485760,
"required": true
},
"kvdata": {
"type": "block",
"min_size": 10485760,
"max_size": 1000000000000000, # whatever
"preferred_size": 100000000000000000,
"required": true
},
"journal": {
"type": "block",
"min_size": 10485760,
"max_size": 104857600,
"preferred_size": 40960000,
"required": false,
"preferred": false
},
}
ceph-disk prepare objectstore=keyvaluestore data=/dev/sda kvdata=/dev/sda
journal=/dev/sdb
The ceph-disk logic would create partitions on the given devices as
needed, trying for preferred size but doing what it needs to to
make it fit. If something is required/preferred but not specified (e.g.,
with filestore's journal) it'll use the same device as the other stuff, so
that the filestore case coudl simplify to
ceph-disk prepare objectstore=filestore data=/dev/sda
or whatever.
Would something like this be general enough to capture the possibilities
and still do everything we need it to?
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html