On Fri, Apr 21, 2017 at 4:26 AM, Hans van Kranenburg
<hans.van.kranenb...@mendix.com> wrote:

>
> == Thinking out of the box ==
>
> Technically, converting from DUP to single could also mean:
> * Flipping one bit in the block group type flags to 0 for each block
> group item
> * Flipping one bit in the chunk type flags and removing 1 stripe struct
> for each metadata chunk item
> * Removing the
> * Anything else?

This is in the realm of efficient file system pruning as a means of
fixing it. And the existing code is not pruning. It's clearly doing a
lot of complex balance computations first, second, and third, before
it even gets to the convert to single chunk task. Such a prune would
need to write out new chunk and dev trees, and then whatever nodes end
up pointing to those, maybe it's just the super blocks.

> How feasible would it be to write btrfs-progs style conversion to do this?

I can pretty much say it's not just a bit flip change because at the
very least you've got new CRCs to write for any changed node.

But looking at the chunk tree with btrfs-debug-tree -t 3 between
single and dup file systems:


    item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15945 itemsize 80
        length 1073741824 owner 2 stripe_len 65536 type METADATA
        io_align 65536 io_width 65536 sector_size 4096
        num_stripes 1 sub_stripes 1
            stripe 0 devid 1 offset 20971520
            dev_uuid 6cd61505-9d47-4521-b980-95e9f20de920

    item 69 key (FIRST_CHUNK_TREE CHUNK_ITEM 298730913792) itemoff
10569 itemsize 112
        length 536870912 owner 2 stripe_len 65536 type METADATA|DUP
        io_align 65536 io_width 65536 sector_size 4096
        num_stripes 2 sub_stripes 1
            stripe 0 devid 1 offset 32250003456
            dev_uuid 1ee7f7aa-701d-42b7-b37f-b3356c277e7d
            stripe 1 devid 1 offset 32786874368
            dev_uuid 1ee7f7aa-701d-42b7-b37f-b3356c277e7d

Whatever on-disk item makes this DUP needs to be removed/changed, and
then it's an open question whether it's sufficient to leave the stripe
1 metadata alone and expect that it'll just be ignored, or if it has
to be zero'd, or if the itemsize has to change to literally end it
after the stripe 0 dev_uuid; ie. in the above example if item 70 needs
to be moved up two lines (of course on disk the encoding of this
information is just a dozen or so bytes not lines).

And then for the dev tree, you can see the above item 69 is pointing
to these two items, and they point to item 69. So I'd expect their
nodes need rewritten.

    item 32 key (1 DEV_EXTENT 32250003456) itemoff 14707 itemsize 48
        dev extent chunk_tree 3
        chunk_objectid 256 chunk_offset 298730913792 length 536870912
        chunk_tree_uuid 2928d93e-c031-464a-b475-e200cf61abac
    item 33 key (1 DEV_EXTENT 32786874368) itemoff 14659 itemsize 48
        dev extent chunk_tree 3
        chunk_objectid 256 chunk_offset 298730913792 length 536870912
        chunk_tree_uuid 2928d93e-c031-464a-b475-e200cf61abac


Anyway, after moving all of this stuff around, you still have to
compute a node CRC. So the whole 16KiB node has to be rewritten.

The proper way to do this in Btrfs terms would be to COW all of the
changed chunk tree nodes elsewhere, all the unneeded items are
removed. New CRCs. And then once that succeeds and is committed to
stable media, new supers written to point to the new chunk and dev
trees which in turn now only point to one of the already written
copies of metadata chunks, without writing out new chunks. Also, if
I'm not mistaken the chunk tree is actually in system chunk. So
there's this neat thing where you want the metadata chunk profile to
be single, described by a tree that itself could be single or dup. The
user space tools today consider "metadata" to include metadata and
system chunks. So converting one converts the other. But in ancient
times the user space code and probably still lurking in todays kernel
code, there's a distinction.

Anyway, yeah it'd be a ton faster. On your file system this is 10MiB
of writes to write out new dev tree and chunk trees that prune out the
unneeded extra copy. Just stop referencing the extra copy.

Basically right now it's doing a balance first, then convert. There's
no efficiency option to just convert via a prune only.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to