Hi Jason, On Wed, Sep 17, 2025 at 9:32 AM Jason Gunthorpe <[email protected]> wrote: > > On Wed, Sep 17, 2025 at 12:18:39PM -0400, Pasha Tatashin wrote: > > On Wed, Sep 17, 2025 at 8:22 AM Jason Gunthorpe <[email protected]> wrote: > > > > > > On Tue, Sep 16, 2025 at 07:50:16PM -0700, Jason Miu wrote: > > > > + * kho_order_table > > > > + * +-------------------------------+--------------------+ > > > > + * | 0 order| 1 order| 2 order ... | HUGETLB_PAGE_ORDER | > > > > + * ++------------------------------+--------------------+ > > > > + * | > > > > + * | > > > > + * v > > > > + * ++------+ > > > > + * | Lv6 | kho_page_table > > > > + * ++------+ > > > > > > I seem to remember suggesting this could be simplified without the > > > special case 7h level table table for order. > > > > > > Encode the phys address as: > > > > > > (order << 51) | (phys >> (PAGE_SHIFT + order)) > > > > Why 51 and not 52, this limits to 63bit address space, is it not? > > Yeah, might have got the math off > > > I like the idea, but I'm trying to find the benefits compared to the > > current per-order tree approach. > > It is probably about half the code compared to what I see here because > everything is agressively simplified.
Thank you very much for providing feedback to me, and I think this is a very smart idea. > > 3. It slightly complicates the logic in the new kernel. Instead of > > simply iterating a known tree for a specific order, the boot-time > > walker would need to reconstruct the per-order subtrees, and walk > > them. > > The core walker just runs over a range, it is easy to compute the > range. I believe the "range" here refers to the specific portion of the tree relevant to the `target_order` being restored, while the `target_order` is the variable from 0 to MAX_PAGE_ORDER to be used in the tree walk in the new kernel. My current understanding of the walker for a given `target_order`: 1. Find the `start_level` from the `target_order`. (for example, target_order = 10, start_level = 4) 2. The path from the root down to the level above `start_level` is fixed (index 0 at each of these levels). 3. At `start_level`, the index is also fixed, by (1 << (63 - PAGE_SHIFT - order)) in a 9 bit slice. 4. Then, for all levels *below* `order_level`, the walker iterates through all 512 table entries, until the bitmap level. so the "range" is the subtrees under the start_level, is my understanding correct? -- Jason Miu
