I saw this go by in the commit log:
commit cc2200c5e60caecf7931e546f6522b2ba364227f
Merge: f8d5807 12c083e
Author: Sage Weil <[email protected]>
Date: Thu Feb 11 08:44:35 2016 -0500
Merge pull request #7537 from ifed01/wip-no-promote-for-delete-fix
osd: fix unnecessary object promotion when deleting from cache pool
Reviewed-by: Sage Weil <[email protected]>
Is there any chance that I was basically seeing with the same thing from the
filesystem standpoint?
Thanks
Steve
> On Feb 5, 2016, at 8:42 AM, Gregory Farnum <[email protected]> wrote:
>
> On Fri, Feb 5, 2016 at 6:39 AM, Stephen Lord <[email protected]> wrote:
>>
>> I looked at this system this morning, and the it actually finished what it
>> was
>> doing. The erasure coded pool still contains all the data and the cache
>> pool has about a million zero sized objects:
>>
>>
>> GLOBAL:
>> SIZE AVAIL RAW USED %RAW USED OBJECTS
>> 15090G 9001G 6080G 40.29 2127k
>> POOLS:
>> NAME ID CATEGORY USED %USED MAX AVAIL
>> OBJECTS DIRTY READ WRITE
>> cache-data 21 - 0 0 7962G
>> 1162258 1057k 22969 3220k
>> cephfs-data 22 - 3964G 26.27 5308G
>> 1014840 991k 891k 1143k
>>
>> Definitely seems like a bug since I removed all references to these from the
>> filesystem
>> which created them.
>>
>> I originally wrote 4.5 Tbytes of data into the file system, the erasure coded
>> pool is setup as 4+2, and the cache has a size limit of 1 Tbyte. Looks like
>> not
>> all the data made it out of the cache tier before I removed content, it
>> removed the
>> content which was only present in the cache tier and created a zero sized
>> object
>> in the cache for all the content. The used capacity is somewhat consistent
>> with
>> this.
>>
>> I tried to look at the extended attributes on one of the zero size object
>> with ceph-dencoder,
>> but it failed:
>>
>> error: buffer::malformed_input: void
>> object_info_t::decode(ceph::buffer::list::iterator&) unknown encoding
>> version > 15
>>
>> Same error on one of the objects in the erasure coded pool.
>>
>> Looks like I am a little too bleeding edge for this, or the contents of the
>> .ceph_ attribute are not an object_info_t
>
> ghobject_info_t
>
> You can get the EC stuff actually deleted by getting the cache pool to
> flush everything. That's discussed in the docs and in various mailing
> list archives.
> -Greg
>
>>
>>
>>
>> Steve
>>
>>> On Feb 4, 2016, at 7:10 PM, Gregory Farnum <[email protected]> wrote:
>>>
>>> On Thu, Feb 4, 2016 at 5:07 PM, Stephen Lord <[email protected]> wrote:
>>>>
>>>>> On Feb 4, 2016, at 6:51 PM, Gregory Farnum <[email protected]> wrote:
>>>>>
>>>>> I presume we're doing reads in order to gather some object metadata
>>>>> from the cephfs-data pool; and the (small) newly-created objects in
>>>>> cache-data are definitely whiteout objects indicating the object no
>>>>> longer exists logically.
>>>>>
>>>>> What kinds of reads are you actually seeing? Does it appear to be
>>>>> transferring data, or merely doing a bunch of seeks? I thought we were
>>>>> trying to avoid doing reads-to-delete, but perhaps the way we're
>>>>> handling snapshots or something is invoking behavior that isn't
>>>>> amicable to a full-FS delete.
>>>>>
>>>>> I presume you're trying to characterize the system's behavior, but of
>>>>> course if you just want to empty it out entirely you're better off
>>>>> deleting the pools and the CephFS instance entirely and then starting
>>>>> it over again from scratch.
>>>>> -Greg
>>>>
>>>> I believe it is reading all the data, just from the volume of traffic and
>>>> the cpu load on the OSDs maybe suggests it is doing more than
>>>> just that.
>>>>
>>>> iostat is showing a lot of data moving, I am seeing about the same volume
>>>> of read and write activity here. Because the OSDs underneath both pools
>>>> are the same ones, I know that’s not exactly optimal, it is hard to tell
>>>> what
>>>> which pool is responsible for which I/O. Large reads and small writes
>>>> suggest
>>>> it is reading up all the data from the objects, the write traffic is I
>>>> presume all
>>>> journal activity relating to deleting objects and creating the empty ones.
>>>>
>>>> The 9:1 ratio between things being deleted and created seems odd though.
>>>>
>>>> A previous version of this exercise with just a regular replicated data
>>>> pool
>>>> did not read anything, just a lot of write activity and eventually the
>>>> content
>>>> disappeared. So definitely related to the pool configuration here and
>>>> probably
>>>> not to the filesystem layer.
>>>
>>> Sam, does this make any sense to you in terms of how RADOS handles deletes?
>>> -Greg
>>
>>
>> ----------------------------------------------------------------------
>> The information contained in this transmission may be confidential. Any
>> disclosure, copying, or further distribution of confidential information is
>> not permitted unless such privilege is explicitly granted in writing by
>> Quantum. Quantum reserves the right to have electronic communications,
>> including email and attachments, sent across its networks filtered through
>> anti virus and spam software programs and retain such messages in order to
>> comply with applicable data security and retention requirements. Quantum is
>> not responsible for the proper and complete transmission of the substance of
>> this communication or for any delay in its receipt.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com