[thanks for the extensive answers] On Mon, Dec 6, 2021 at 10:13 PM Kent Mcleod <[email protected]> wrote:
> On Wed, Dec 1, 2021 at 9:59 AM Sam Leffler via Devel <[email protected]> > wrote: > > > > Some updates inline. Especially interested in answers for #5 and #8. > > > > On Wed, Nov 24, 2021 at 9:12 PM Sam Leffler <[email protected]> wrote: > > > > > I've been investigating how capd-loader works and have some questions. > My > > > apologies if these were previously discussed. > > > > > > 1. CDL_Model is mutable. The capDL specification looks like a perfect > > > candidate for being in .rodata but it's stored+treated as mutable. > Why? I > > > see capdl-loader uses the mutability to store mapped page frame caps > and, > > > for risc-v, VSpace roots, but both can be easily handled w/o modifying > the > > > spec. Having this immutable has many benefits. > > > > > It's currently mutable because of the updates to it as it is > processed. If these updates were removed, then it would be a good idea > to make it immutable. How would you handle the mapped_frame_cap > bookkeeping without storing it in the spec as a single frame could > have multiple shared mappings which each require a cap? > I handled the 2 cases on risc-v by: 1. Instead of marking vspace roots in the model, keeping a separate set of obj id's. I collect the set on the stack while creating objects and then record it "elsewhere" for later use. I actually use a data structure that has a fixed amount of storage & automatically spills to the heap on overflow. 32 entries works for me w/o spilling but obviously it's spec-dependent. 2. For the shared mappings I re-use the equivalent of the capdl_to_sel4_copy array. Since that array is only used for TCB & CNode dup'd caps there appear to be no conflicts/overlaps. I think collecting vspace roots by checking for CDL_TCB_VTable_Slot in the TCB works for all platforms. The benefit of tracking them separately is you can avoid walking the object list in init_pd_asids & init_vspace. > > > 2. All page frames are treated as shared. CAPDL_SHARED_FRAMES is > #define'd > > > in the code so every frame is treated as shared. This inflates the > size of > > > the orig_caps array and the kernel capabilities table but seems > > > unnecessary. Why? I've verified that instead of doing the Copy op on > each > > > page you can just check the return of the seL4_Page_Map call and if > > > necessary clone the cap. Better would be for camkes to identify shared > page > > > frames so this doesn't require 2x syscalls (which spam's the console w/ > > > CONFIG_PRINTING enabled). > > > > > What you suggest would be possible. For some object types (like an > endpoint) the original cap is special and can be used to revoke all > children caps that have been derived from it. All frame types don't > have this property which means that the original cap is no different > from any copies of it. Embedding whether the frame is shared into the > object as an attribute would make this easier to optimize as I agree > that it becomes pretty inefficient in typical systems with many more > private frames than shared. > Thanks for pointing out the issue with endpoints. I was only suggesting doing this copy-on-write style handling for CDL_Frame objects. This approach lets me halve the size of capdl_to_sel4_copy & capdl_to_sel4_orig. > > > > 3. Why does copy_addr_with_pt exist? There are two memory regions > created > > > for mapping page frames into the rootserver address space for doing > fill > > > ops. One is 64KB and the other is 1 page (nominally 4KB). init_frame > tries > > > the 4KB region first then falls back to the 64K region if mapping into > the > > > 4KB region fails. Why (i.e. why would the 1st mapping call fail)? On my > > > risc-v system I never use the 64K region and I'm not sure when it > might be > > > used? Even if it were used, why is 64K always allocated; seems like the > > > region should be sized according to the target platform. Having this > extra > > > region has a cost that can be significant on small systems. > > > > > > > I got this backwards; on risc-v at least the 64KB region is used but not > > the page immediately after the ipc buffer. On my system I eliminated the > > latter and shrank the bss region to 1 page. Looking for feedback on > whether > > this will fail on other platforms. > > > > Short answer: For copy_addr_with_pt, yes we could make the 64kiB > definition aarch32 specific and 4kiB on other platforms. > Long answer: It looks like copy_addr_with_pt is defined to be an > address range that is guaranteed to have mapped page tables down to > the last level of the page table hierarchy and is therefore suitable > for mapping 4kiB pages into (or 64k pages only on aarch32 which is the > only supported arch which allows larger pages to also be mapped in at > the lowest level page table entries. copy_addr on the other hand is > guaranteed to not have a mapped page table at the lowest level of the > page table hierarchy which allows for 1MiB, 2MiB, 4MiB and 16MiB page > sizes to be mapped here and achieves this by picking a virtual address > with alignment of 16MiB above the end of the image but still expecting > this to be within the page table that is one level up from the lowest > level page table in the hierarchy. > Then init_frame tries either region when trying to initialize a frame > of varying size between 4kiB and 16MiB. Frames the next size up, > 1GiB, would not be able to be mapped by the loader without adding a > new copy_addr variable that has larger alignment to guarantee it > doesn't collide with a page table at the next level up. > Thank you. Still not sure whether this means it's possible to eliminate the copy_addr support (sounds like it might be arch-dependent?). Is the multiple page sizes for an x86 target? (so I can test my approaches). > > > > > > 4. console output. Why is sel4_DebugPutChar (or similar) not a standard > > > part of the kernel? This forces capdl-loader to depend on > > > libsel4platsupport for console output which in turn affects cap space. > I > > > can't imagine an seL4 system that cannot write to the console and I > think > > > it's entirely reasonable to have a critical system service like the > > > rootserver use that. > > > > > > > The verified kernel configurations don't have a uart driver in the > kernel. This decision follows from an L4 design principle that the > kernel doesn't provide an implementation for anything that could be > provided with an implementation outside of the kernel. In practice, > if you know that the loader is only going to be used on a specific > platform, then you could write a simple specialized uart driver that > you link in instead of the entirety of libplatsupport. > > > > 5. duplicate_caps. The scheme for constructing a running system from a > > > spec is straightforward except for duplicating caps. Why are all TCB's > and > > > CNode caps dup'd? Why does init_tcbs call init_tcb on the original > object > > > but immediately configure_tcb on the dup? There are other places where > the > > > dup cap is used instead of the original cap (e.g. init_cnode_slot). > None of > > > this seems to matter on risc-v; maybe this matters on some other > > > architectures? > > > > > I don't know the reason for this and it's older than the git history I > have access to. It looks like its unnecessary for TCBs, and is likely > still necessary for CNodes when loading specs that have CNodes that > are moved into slots in other CNodes when constructing a multi-cnode > CSpace. > Is there an example in the tree of a nested CNode spec? I'm still not clear on why the dup is required so being able to repro this would be helpful. > > > 6. On risc-v all pages appear to be marked executable--including ipc > > > buffers! This seems just wrong and a potential security hole. > > > > > > > This looks to be a bug. Marking the NX bit on risc-v when the grant right > > is not set gives me what I want. arm already did this. > > > > It's a bug because it doesn't do it the same way as Arm, but the way > arm does it is a bit misleading. The kernel itself only uses the read > and write bits from cap rights for frame mappings, and a cap with > write-only rights gets downgraded to kernel-only mapping, so really > the kernel only lets you describe read-write, read-only or none. For > both read-write and read-only cap abilities, the frame could be mapped > exec or non-exec. The read right implies executable rights from the > kernel's perspective. The capdl loader does add a little bit of extra > security with using the grant right in the spec to imply whether the > frame has executable permissions and tries to create the mapping > accordingly (except on riscv which is a bug), but if the frame cap for > this mapping is also given to the component in order to do things like > cache maintenance invocations, then the component could use that > capability to convert it's non-exec mappings to executable mappings. > > There's probably a good argument for overhauling how the kernel > handles frame rights so that you can express executeNever as a > capability right and associate it with the grant or grantreply bit in > the existing capRights. I think this wasn't done originally because > either there were no spare bits in the cap structure at the time, or > armv6 didn't support executeNever mappings (but you'd have to > fact-check me on that). > > > > > > > > 7. Why does seL4_TCB_WriteRegisters take a _mutable_ seL4_UserContext? > I'm > > > guessing this is a mistake in the xml spec (but haven't looked). > > > > > > > This looks to be a shortcoming in the python script that generates the > > syscall stubs. > > > Correct. > > > > > > > 8. init_cspaces mystifies me. There are two loops over all CNodes; one > > > that does "COPY" operations and one that does "MOVE" operations. I see > > > nothing explaining what's happening. I've read this code many times and > > > still don't get it; please explain. > > > > > As I mentioned higher up, some caps are special as originals and allow > additional operations to be performed on them compared to their copies > even if the cap rights are the same for both. Initializing all of the > cnodes in the system involves moving capabilities (seL4_CNode_Move) or > deriving capabilities from source capabilities either with the same > rights (seL4_CNode_Copy) or lesser rights (seL4_CNode_Mint). Its > split into two passes because the derivation of capabilities need to > happen before the source capabilities are moved into the destination > CNode. > > > > > 9. According to comments, the use of a global static to setup the > > > UserContext for a new TCB is done because one cannot take the address > of a > > > local variable. On what architecture is this true? On risc-v the > context is > > > copied into "registers" before calling the kernel so this doesn't > appear to > > > be an issue with crossing the user-kernel protection boundary. > > > > > This is true for the kernel with the proofs. The very first > implementation of this loader was as part of a project that wanted to > verify it using the same tooling as the kernel proofs. So that > comment is probably documenting a requirement that is no longer > present. > > > > 10. Memory reclamation. I'm unclear what happens to the memory used by > > > capdl-loader. The process does a TCB_Suspend call when it's done. Does > this > > > cause the CSpace & VSpace to be reclaimed? For CAmkES this doesn't > matter > > > as there's no support for allocating memory but I'm building a system > where > > > this isn't true. How do I release memory (in particular) back to the > > > untyped pool? > > > > > > > This was discussed on the developers zoom call so no need to belabor it. > > Sounds like I've got my work cut out for me... > > > > Yea, this one is a bit more involved. > It appears I need this so may be back w/ more questions :). > > > > > > I have a bunch of niggly code complaints/questions but let's start with > > > the above. > > > > > > -Sam > > > > > _______________________________________________ > > Devel mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
