Hi, this week has not been much productive. I first tried to implement capability mode for process using separated syscall tables for process in capability mode. It ended up being overkill and I needed to modify various place of the imgact framework. I had to modify the sysent structure to have a private part for each process to store the syscall table pointer, and i needed to store the capability mode syscall table for each executable type. At the end, I've let this approach down, and I fallback to the freeBSD way. I'm done with the modifications of makesyscalls.sh to add flags to the syscall table.
During the second part of the week, I've been converting various call to holdvnode/holdsock in the kernel to the capsicum API, which took me some time because I had to check the semantic of the calling function to make sure I'd chosen the good set of rights. I then spent some time understanding the namecache code to implement capsicum for nlookup. In capability mode, absolute lookup are not allowed, only relative lookups from the *at syscall family are permitted. For that, we need a strictly relative lookup. FreeBSD choose to forbid .. in paths, that way lookups are always strictly relative, but it breaks compatibility. Matt suggested another solution : maintain a counter which is incremented when the path is lookup down, and decremented when .. is encountered. If the counter is positive, we are supposed to be in the sandbox. Using the parent pointer, we can verify that we are actually still in the sandbox. If not, a race has occur, and we must relookup the path. The first idea was to allow absolute symlink too, and assume that what is under the symlink target is also in the sandbox. But I don't see how to prevent race easily, and freeBSD does not allow that. For now, i'm not allowing it either. Thanks, joris
