Thorne,

That's why I asked you to create a separate pool. All writes go to the
original pool, and it is possible to see object counts per-pool.

On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler <tho...@ddns.com.au> wrote:

> Alexander,
>
> Thank you, but as I said to Igor: The 5.5TB of files on this filesystem
> are virtual machine disks. They are under constant, heavy write load. There
> is no way to turn this off.
> On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:
>
> Hello Thorne,
>
> Here is one more suggestion on how to debug this. Right now, there is
> uncertainty on whether there is really a disk space leak or if
> something simply wrote new data during the test.
>
> If you have at least three OSDs you can reassign, please set their
> CRUSH device class to something different than before. E.g., "test".
> Then, create a new pool that targets this device class and add it to
> CephFS. Then, create an empty directory on CephFS and assign this pool
> to it using setfattr. Finally, try reproducing the issue using only
> files in this directory. This way, you will be sure that nobody else
> is writing any data to the new pool.
>
> On Tue, Mar 19, 2024 at 5:40 PM Igor Fedotov <igor.fedo...@croit.io> 
> <igor.fedo...@croit.io> wrote:
>
> Hi Thorn,
>
> given the amount of files at CephFS volume I presume you don't have
> severe write load against it. Is that correct?
>
> If so we can assume that the numbers you're sharing are mostly refer to
> your experiment. At peak I can see bytes_used increase = 629,461,893,120
> bytes (45978612027392  - 45349150134272). With replica factor = 3 this
> roughly matches your written data (200GB I presume?).
>
>
> More interestingly is that after file's removal we can see 419,450,880
> bytes delta (=45349569585152 - 45349150134272). I could see two options
> (apart that someone else wrote additional stuff to CephFS during the
> experiment) to explain this:
>
> 1. File removal wasn't completed at the last probe half an hour after
> file's removal. Did you see stale object counter when making that probe?
>
> 2. Some space is leaking. If that's the case this could be a reason for
> your issue if huge(?) files at CephFS are created/removed periodically.
> So if we're certain that the leak really occurred (and option 1. above
> isn't the case) it makes sense to run more experiments with
> writing/removing a bunch of huge files to the volume to confirm space
> leakage.
>
> On 3/18/2024 3:12 AM, Thorne Lawler wrote:
>
> Thanks Igor,
>
> I have tried that, and the number of objects and bytes_used took a
> long time to drop, but they seem to have dropped back to almost the
> original level:
>
>   * Before creating the file:
>       o 3885835 objects
>       o 45349150134272 bytes_used
>   * After creating the file:
>       o 3931663 objects
>       o 45924147249152 bytes_used
>   * Immediately after deleting the file:
>       o 3935995 objects
>       o 45978612027392 bytes_used
>   * Half an hour after deleting the file:
>       o 3886013 objects
>       o 45349569585152 bytes_used
>
> Unfortunately, this is all production infrastructure, so there is
> always other activity taking place.
>
> What tools are there to visually inspect the object map and see how it
> relates to the filesystem?
>
>
> Not sure if there is anything like that at CephFS level but you can use
> rados tool to view objects in cephfs data pool and try to build some
> mapping between them and CephFS file list. Could be a bit tricky though.
>
> On 15/03/2024 7:18 pm, Igor Fedotov wrote:
>
> ceph df detail --format json-pretty
>
> --
>
> Regards,
>
> Thorne Lawler - Senior System Administrator
> *DDNS* | ABN 76 088 607 265
> First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
> P +61 499 449 170
>
> _DDNS
>
> /_*Please note:* The information contained in this email message and
> any attached files may be confidential information, and may also be
> the subject of legal professional privilege. _If you are not the
> intended recipient any use, disclosure or copying of this email is
> unauthorised. _If you received this email in error, please notify
> Discount Domain Name Services Pty Ltd on 03 9815 6868 to report this
> matter and delete all copies of this transmission together with any
> attachments. /
>
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us athttps://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
>
> Regards,
>
> Thorne Lawler - Senior System Administrator
> *DDNS* | ABN 76 088 607 265
> First registrar certified ISO 27001-2013 Data Security Standard ITGOV40172
> P +61 499 449 170
>
> [image: DDNS]
> *Please note: The information contained in this email message and any
> attached files may be confidential information, and may also be the subject
> of legal professional privilege. If you are not the intended recipient any
> use, disclosure or copying of this email is unauthorised. If you received
> this email in error, please notify Discount Domain Name Services Pty Ltd on
> 03 9815 6868 to report this matter and delete all copies of this
> transmission together with any attachments.*
>
>
>


-- 
Alexander E. Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to