On Mon, Jan 4, 2016 at 7:21 PM, Sage Weil <s...@newdream.net> wrote:
> On Mon, 4 Jan 2016, Guang Yang wrote:
>> Hi Cephers,
>> Happy New Year! I got question regards to the long PG peering..
>>
>> Over the last several days I have been looking into the *long peering*
>> problem when we start a OSD / OSD host, what I observed was that the
>> two peering working threads were throttled (stuck) when trying to
>> queue new transactions (writing pg log), thus the peering process are
>> dramatically slow down.
>>
>> The first question came to me was, what were the transactions in the
>> queue? The major ones, as I saw, included:
>>
>> - The osd_map and incremental osd_map, this happens if the OSD had
>> been down for a while (in a large cluster), or when the cluster got
>> upgrade, which made the osd_map epoch the down OSD had, was far behind
>> the latest osd_map epoch. During the OSD booting, it would need to
>> persist all those osd_maps and generate lots of filestore transactions
>> (linear with the epoch gap).
>> > As the PG was not involved in most of those epochs, could we only take and 
>> > persist those osd_maps which matter to the PGs on the OSD?
>
> This part should happen before the OSD sends the MOSDBoot message, before
> anyone knows it exists.  There is a tunable threshold that controls how
> recent the map has to be before the OSD tries to boot.  If you're
> seeing this in the real world, be probably just need to adjust that value
> way down to something small(er).
It would queue the transactions and then sends out the MOSDBoot, thus
there is still a chance that it could have contention with the peering
OPs (especially on large clusters where there are lots of activities
which generates many osdmap epoch). Any chance we can change the
*queue_transactions* to "apply_transactions*, thus we block there
waiting for the persistent of the osdmap. At least we may be able to
do that during OSD booting? The concern is, if the OSD is active, the
apply_transaction would take longer with holding the osd_lock..
I don't find such tuning, could you elaborate? Thanks!
>
> sage
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to