Dear John,
Thanks for your reply.
Fei Xia
> 在 2015年11月19日,18:07,John Spray <[email protected]> 写道:
>
> On Thu, Nov 19, 2015 at 9:43 AM, xiafei <[email protected]> wrote:
>> Hi, all:
>> I have two questions about MDLog:
>>
>> 1. The max number of logsegments per MDlog (mds_log_max_segments) is
>> configured to be 30 in the config_opts.h file.
>> However, the MDLog doesn’t check the number of logsegments when it start a
>> new segment.
>> The configuration is only used when the number of segments in a MDLog is
>> larger than 2*mds_log_max_segments.
>> The MDS notifies monitor, while the monitor does nothing.
>> My question is: Is the logsegments size limited to a max size? If so, what’s
>> the size?
>
> mds_log_max_segments is used in MDLog::trim (where it is aliased to
> the local max_segments variable). The MDS will trim some segments if
> there are currently more than mds_log_max_segments: this is the
> typical way to limit how long the journal is. It's not enforced
> rigidly: if you set max segments to 2, and do lots of metadata IO,
> you'll see it bounce between 2 and 3 most of the time.
>
> You have already noticed that this setting is also used in Beacon.cc
> to generate a health warning if the journal has grown to 2x the size
> limit: this is to alert the user if the MDS is failing to trim its
> journal (can be caused by a certain class of bugs or potentially just
> by a pathologically slow OSD cluster)
>
>> 2. The MDLog prezeros two periods ahead of the write_pos of Journaler.
>> The comment of _issue_prezero function is “we need to zero at least two
>> periods, minimum, to ensure that we have a full empty object/period in front
>> of us”.
>> Does it means that the OSD will preallocate objects for the Journaler ?
>> The function is actually implemented by Objecter::remove. However, the
>> Objecter::remove only removes a object through FileStore/NewStore.
>> It seams that the OSD doesn’t preallocate objects. If so, then what’s the
>> purpose of prezero? Or, do I misunderstand anything?
>
> Journaler uses the Filer abstraction, and when going through Filer
> there is no distinction between zeros in an object and the object
> being missing. Either way when you read that range you get zeros.
>
> Prezeroing is a bit subtle. It is is necessary because the journal
> writes don't necessarily persist in a monotonic forward order. In a
> crash, we might sometimes leave a gap at the front of the journal,
> then some data. We'll reprobe (Filer::probe) to the start of the gap,
> leaving data after the gap as junk (this is OK because journal data
> isn't considered safe until everything up to its position is safe
> (i.e. Journaller::safe_pos advances)). After that recovery, we need
> to do prezeroing because otherwise, if we crashed again, on the
> subsequent recovery we might confuse the junk with valid data.
>
> John
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com