Suppose you had two classes of OSD, one fast (eg SSD's or 15K SAS drives) and 
the other slow (eg 7200RPM SATA drives). The fast storage is expensive so you 
might not have so much of it. Rather than try and map whole volumes to the best 
class of storage (eg fast for databases, slow for user files), it would be nice 
if ceph could monitor activity and move busy pg's to the fast OSD's, and move 
idle pg's to the slower OSD's.

What I had in mind initially was a daemon external to ceph that would monitor 
the statistics to determine what pg's were currently being hit hard, and make 
decisions about placement, moving pg's around to maximise performance. As a 
minimum such a daemon would need access to the following information:
. read and write count for each pg (to determine io rate)
. class of each osd (fast/slow/etc). Ideally this would be defined as part of 
the osd definition but an external config file would suffice for a proof of 
concept.
. an api to actually manually place pg's and not have ceph make its own 
decisions and move them back (this may the sticking point...)
. a way to make sure that moving pg's didn't break the desired redundancy 
(tricky?)

A pg with a high write rate would need the primary pg and all replica's on fast 
storage. A pg with low write but high read rate could have the primary on fast 
storage and the replica's on slow storage.

>From reading the docs it seems ceph doesn't do this already. There is a 
>reweight-by-utilization command which may give some of the same benefit.

Obviously there is a cost to moving pg's around, but it should be fairly easy 
to balance the cost of moving vs the benefit of having the busy pg's on a fast 
osd. None of the decisions to move pg's would need to be made particularly 
quickly, and the rate at which move requests were initiated would be limited to 
minimise impact.

Is something like this possible? Or useful? (I think it would be if you want to 
maximise the use of your expensive SSD's) Is a pg a small enough unit for this 
or too coarse?

Thanks

James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to