Overall this seems like a fine idea. Here are a few things we'll need to think about:
Allocating the dnodes becomes trickier now; we will need to think about how slow it could be. Depending on the use case, it could make sense to have "slabs" of dnodes of different sizes. E.g. each 16k block could contain dnodes of all the same size. Although then you have the problem of finding a slab of the right dnode-size... I'd like to see a use case before we actually integrate this. Presumably the extra space is used for the bonus buffer and thus the SA. I think the use case will inform the user interface. Ideally the specific use case could handle setting the dnode size tranparently to the admin, without the need for a "dnodesize" property. The dnode_phys_t is 512 bytes, which happens to be the same as the prevalent disk sector size (512). But this is a coincidence, and newer disks have 4K sectors. So I'd suggest you not name things in terms of sectors, but rather as "count" (e.g. this dnode uses the space of 3 minimum-size dnodes) or just "bytes" (which we assert is always a multiple of DNODE_SIZE). --matt On Mon, Nov 25, 2013 at 11:09 AM, Prakash Surya <[email protected]> wrote: > Hi, > > I have a compelling use case for larger dnodes and have spent a small > amount of time looking into a potential way of implementing this. But, > before I spend any more time on it, I want to get some buy in from the > community and upstream on the implementation details > > My current plan for adding support for larger dnodes is actually pretty > simple. I want to "steal" 8 bits from the unused padding currently in > the dnode_phys_t structure, use these bits to store a size for the dnode > (in number of 512 byte sectors), and then consult this size whenever > traversing a dnode block (in order to properly skip over the necessary > amount of bonus space and reach the "next" dnode in the block). This > would not only allow us to have a larger fixed size for dnodes, but it > would also allow us to have variable sized dnodes. > > I have a *VERY* experimental patch to update the object allocation code > to test this idea out, and it seems to work pretty well. I can create a > dataset, create dnodes with varying size (.5K - 16K), and then use zdb > to ensure the object number allocation worked as planned. If you are > interested in checking out the source, it's up on github: > > > https://github.com/prakashsurya/zfs/commit/15c110722a50bd7196a71b9a360e5217445c7d13 > > Does anybody see any major road blocks to implementing larger dnodes in > this way? I'm still pretty unfamiliar will this section of the code, so > I'd love for some input from the people more familiar with it. > > There is still a lot of work to go from this experimental patch to > something ready to land. The send/receive bits, diff bits, and code > to actually use the extra bonus area all remain to be implemented; I was > just interested in testing out the object allocation code with that > patch. > > To properly land upstream, this type of change is dependent on adding > dataset level feature flags as well. So I'll soon need to bring up that > discussion, and decide on what's the best way to implement that. > > I welcome any and all feedback on this idea. So far, through testing and > off list discussions, I don't foresee any major issues. It's just a > matter of careful implementation and extensive testing. > > -- > Cheers, Prakash > > _______________________________________________ > developer mailing list > [email protected] > http://lists.open-zfs.org/mailman/listinfo/developer >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
