Yep, thanks for all the help tracking down the root cause!
-Sam

On Thu, Mar 17, 2016 at 10:50 AM, Jeffrey McDonald <jmcdo...@umn.edu> wrote:
> Great, I just recovered the first placement group from this error.   To be
> sure, I  ran a deep-scrub and that comes back clean.
>
> Thanks for all your help.
> Regards,
> Jeff
>
> On Thu, Mar 17, 2016 at 11:58 AM, Samuel Just <sj...@redhat.com> wrote:
>>
>> Oh, it's getting a stat mismatch.  I think what happened is that on
>> one of the earlier repairs it reset the stats to the wrong value (the
>> orphan was causing the primary to scan two objects twice, which
>> matches the stat mismatch I see here).  A pg repair repair will clear
>> that up.
>> -Sam
>>
>> On Thu, Mar 17, 2016 at 9:22 AM, Jeffrey McDonald <jmcdo...@umn.edu>
>> wrote:
>> > Thanks Sam.....
>> >
>> > Since I have prepared a script for this, I decided to go ahead with the
>> > checks.....(patience isn't one of my extended attributes....)
>> >
>> > I've got a file that searches the full erasure encoded spaces and does
>> > your
>> > checklist below.   I have operated only on one PG so far, the 70.459 one
>> > that we've been discussing.    There was only the one file that I found
>> > to
>> > be out of place--the one we already discussed/found and it has been
>> > removed.
>> >
>> > The pg is still marked as inconsistent.   I've scrubbed it a couple of
>> > times
>> > now and what I've seen is:
>> >
>> > 2016-03-17 09:29:53.202818 7f2e816f8700  0 log_channel(cluster) log
>> > [INF] :
>> > 70.459 deep-scrub starts
>> > 2016-03-17 09:36:38.436821 7f2e816f8700 -1 log_channel(cluster) log
>> > [ERR] :
>> > 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones,
>> > 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
>> > 68440088914/68445454633 bytes,0/0 hit_set_archive bytes.
>> > 2016-03-17 09:36:38.436844 7f2e816f8700 -1 log_channel(cluster) log
>> > [ERR] :
>> > 70.459 deep-scrub 1 errors
>> > 2016-03-17 09:44:23.592302 7f2e816f8700  0 log_channel(cluster) log
>> > [INF] :
>> > 70.459 deep-scrub starts
>> > 2016-03-17 09:47:01.237846 7f2e816f8700 -1 log_channel(cluster) log
>> > [ERR] :
>> > 70.459s0 deep-scrub stat mismatch, got 22319/22321 objects, 0/0 clones,
>> > 22319/22321 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
>> > 68440088914/68445454633 bytes,0/0 hit_set_archive bytes.
>> > 2016-03-17 09:47:01.237880 7f2e816f8700 -1 log_channel(cluster) log
>> > [ERR] :
>> > 70.459 deep-scrub 1 errors
>> >
>> >
>> > Should the scrub be sufficient to remove the inconsistent flag?   I took
>> > the
>> > osd offline during the repairs.    I've looked at files in all of the
>> > osds
>> > in the placement group and I'm not finding any more problem files.
>> > The
>> > vast majority of files do not have the user.cephos.lfn3 attribute.
>> > There
>> > are 22321 objects that I seen and only about 230 have the
>> > user.cephos.lfn3
>> > file attribute.   The files will have other attributes, just not
>> > user.cephos.lfn3.
>> >
>> > Regards,
>> > Jeff
>> >
>> >
>> > On Wed, Mar 16, 2016 at 3:53 PM, Samuel Just <sj...@redhat.com> wrote:
>> >>
>> >> Ok, like I said, most files with _long at the end are *not orphaned*.
>> >> The generation number also is *not* an indication of whether the file
>> >> is orphaned -- some of the orphaned files will have ffffffffffffffff
>> >> as the generation number and others won't.  For each long filename
>> >> object in a pg you would have to:
>> >> 1) Pull the long name out of the attr
>> >> 2) Parse the hash out of the long name
>> >> 3) Turn that into a directory path
>> >> 4) Determine whether the file is at the right place in the path
>> >> 5) If not, remove it (or echo it to be checked)
>> >>
>> >> You probably want to wait for someone to get around to writing a
>> >> branch for ceph-objectstore-tool.  Should happen in the next week or
>> >> two.
>> >> -Sam
>> >>
>> >
>> > --
>> >
>> > Jeffrey McDonald, PhD
>> > Assistant Director for HPC Operations
>> > Minnesota Supercomputing Institute
>> > University of Minnesota Twin Cities
>> > 599 Walter Library           email: jeffrey.mcdon...@msi.umn.edu
>> > 117 Pleasant St SE           phone: +1 612 625-6905
>> > Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >
>> >
>
>
>
>
> --
>
> Jeffrey McDonald, PhD
> Assistant Director for HPC Operations
> Minnesota Supercomputing Institute
> University of Minnesota Twin Cities
> 599 Walter Library           email: jeffrey.mcdon...@msi.umn.edu
> 117 Pleasant St SE           phone: +1 612 625-6905
> Minneapolis, MN 55455        fax:   +1 612 624-8861
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to