Re: [ceph-users] Ceph Journal Disk Size

Shane Gibson Thu, 02 Jul 2015 10:14:53 -0700

Lionel - thanks for the feedback ... inline below ...

On 7/2/15, 9:58 AM, "Lionel Bouton" 
<[email protected]<mailto:[email protected]>> wrote:


Ouch. These spinning disks are probably a bottleneck: there are regular advices 
on this list to use one DC SSD for 4 OSDs. You would probably better off with a 
dedicated partition at the beginning of each OSD disk or worse one file on the 
filesystem but it should still be better than a shared spinning disk.

I understand the benefit of journals on SSDs - but if you don't have them, you 
don't have them.  With that in mind, I'm completely open to any ideas on the 
"best structuring" of using 7200 rpm disks with journal/osd device types.    
I'm open to playing around with performance testing various scenarios.  Again - 
we realize this is "less than optimal", but I would like to explore tweaking 
and tuning this setup for "the best possible performance" you can get out of it.


Anyway given that you get to use 720 disks (12 disks on 60 servers), I'd still 
prefer your setup to mine (24 OSDs) even with what I consider a bottleneck your 
setup as probably far more bandwidth ;-)

My understanding from reading the Ceph docs was that mixing Journal on the OSD 
disks was strongly considered a "very bad idea", due to the IO operations 
between the Journal and OSD disk itself creating contention.  Like I said - I'm 
open to testing this configuration ... and probably will.  We're finalizing our 
build/deployment harness right now to be able to modify the architecture of the 
OSDs with a fresh build fairly easily.


A reaction to one of your earlier mails:
You said you are going to 8TB drives. The problem isn't so much with the time 
needed to create new replicas when an OSD fails but the time to fill one 
freshly installed. The rebalancing is much faster when you add 4 x 2TB drives 
than 1 x 8TB drives.

Why should it matter how long it takes a single drive to "fill"??  Please note 
that I'm very very new to operating Ceph, so am working to understand these 
details - and I'm certain my understanding is still a bit ... simplistic ... :-)

If a drive failes, wouldn't the replica copies on that drive be replicated 
across "other OSD" devices when appropriate timers/triggers cause those data 
migration/re-replications to kick off?

Subsequently, you add a new OSD and bring it online.  It's now ready to be used 
- and depending on your CRUSH map policies, will "start to fill" - yes, this 
process ... to "fill an entire 8TB drive" certainly would take a while, but 
that shouldn't block or degrade the entire cluster - since we have a replica 
copy set of 3 ... there are "two other replica copies" to service read 
requests.  If a replica copy is updated, which is currently in flight with the 
rebalancing to that new OSD, yes, I can see where there would be 
latency/delays/issues.   As the drive is rebalanced, is it marked "available" 
for new writes?  That would certainly cause significant latency with a new 
write request - I'd hope that during "rebalance" operation, that OSD disk is 
not marked available for new writes.

Which brings me to a question ...

Are there any good documents out there that detail (preferably via a flow 
chart/diagram or similar) how the various failure/recovery scenarios cause 
"change" or "impact" to the cluster?   I've seen very little in regards to 
this, but may be digging in the wrong places?

Thank you for any follow up information that helps illuminate my understanding 
(or lack thereof) how Ceph and failure/recovery situations should impact a 
cluster...

~~shane

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Journal Disk Size

Reply via email to