Thanks Sage !

Sorry, some more question :-)

1. When the pg map is 3.5 -> [2,3] (osd.1 is down) , will the IO be blocked on 
this pg still 3.5 -> [2,3,4] ?

2. what if OSD 1 came up after OSD 4 backfill is complete and the pg map is 3.5 
-> [2,3,4] ? All recovery done and pgs are in active + clean state. Will the 
map again change to 3.5 -> [1,2,3] ? IMO, this should not be as it will 
unnecessarily generate some traffic, isn't it ?

3. Will the flow be similar if one of the replica OSD goes down instead of 
primary in the step '2'  I mentioned earlier ?  Say, osd.2 went down instead of 
osd.1 ?

Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:[email protected]]
Sent: Monday, February 23, 2015 1:03 PM
To: Somnath Roy
Cc: Samuel Just ([email protected]); Ceph Development
Subject: Re: Recovery question

On Mon, 23 Feb 2015, Somnath Roy wrote:
> Hi,
> Can anyone help me understand what will happen in the following scenarios ?
>
> 1. Current PG map : 3.5 -> OSD[1,2,3]
>
> 2. 1 is down and new map : 3.5 -> OSD[2,3,4]

More likely it's:

 1: 3.5 -> [1,2,3]
 2: 3.5 -> [2,3]   (osd.1 is down)
 3: 3.5 -> [2,3,4] (osd.1 is marked out)

> 3. Need to for backfill recovery for 4 and it started

If log recovery will work, we'll do that and it's nice and quick.  If backfill 
is needed, we will do

 4: 3.5 -> [2,3]  (up=[2,3,4]) (pg_temp record added to map to log-recoverable 
OSDs)

> 4. Meanwhile OSD 1 came up , it was down for short amount of time

 5: 3.5 -> [1,2,3] (osd.1 is back up and in)

> 5. Will pg 3.5 mapping change considering OSD 1 recovery could be log
> based ?

It will change immediately when osd.1 is back up, regardless of what data is 
where.  If it's log recoverable, then no mapping changes will be needed.  If 
it's not, then

 6: 3.5 -> [2,3,4]  (up=[1,2,3]) (add pg_temp mapping while we backfill osd.1)
 7: 3.5 -> [1,2,3]  (pg_temp entry removed when backfill completes)

> 6. Also, if OSD 4 recovery could be log based, will there be any
> change in pg map if OSD 1 is up during the recovery ?

See above

Hope that helps!
sage

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to