> On 5/7/2025 6:02 AM, Saurabh Singh Sengar wrote: > > > [..] > > >> + } > >> + > >> + local_irq_save(flags); > >> + in = *this_cpu_ptr(hyperv_pcpu_input_arg); > >> + out = *this_cpu_ptr(hyperv_pcpu_output_arg); > >> + > >> + if (copy_from_user(in, (void __user *)hvcall.input_ptr, > >> hvcall.input_size)) { > > > > Here is an issue related to usage of user copy functions when interrupt are > disabled. > > It was reported by Michael K here: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > > ub.com%2Fmicrosoft%2FOHCL-Linux- > Kernel%2Fissues%2F33&data=05%7C02%7Css > > > engar%40microsoft.com%7C7399d799e9ac47f03bf008dd8d9c3f6a%7C72f988 > bf86f > > > 141af91ab2d7cd011db47%7C1%7C0%7C638822424480098010%7CUnknown > %7CTWFpbGZ > > > sb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIs > IkFOI > > > joiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Q0FcQVb21btq1Oa > pihyvMaX > > JLtqQNOp7qUFT7BzgHI0%3D&reserved=0 > > From the practical point of view, that memory will be touched by the user > mode by virtue of Rust requiring initialization so the a possible page fault > would be resolved before the IOCTL. OpenHCL runs without swap so the the > memory will not be paged out to require page faults to be brought in back. > > I do agree that might be turned into a footgun by the user land if they malloc > a page w/o prefaulting (so it's just a VA range, not backed with the physical > page), and then send its address straight over here right after w/o writing > any > data to it. Perhaps likelier with the output data. Anyway, yes, relying on the > user land doing sane things isn't the best approach to the kernel > programming.
Right, we should fix it. > > If we're inclined to fix this, I'd encourage to take an approach that works > for > the confidential VMs as well so we don't have to fix that again when start > upstreaming what we have for SNP and TDX. The allocation *must* be visible > to the hypervisor in the confidential scenarios. Fine with me. I am letting you and Naman to decide this. > > Or, maybe we could avoid the allocations by reading the first byte of the user > land buffer to "pre-fault" the page outside of the scope that disables > interrupts. Why allocate if we can avoid that? > Could set up also the SMP remote calls to run this on the desired CPU. > > Summarizing for the case you want to change this: > > 1. Keep interrupts disabled when reading/writing to/from the Hyper-V > driver allocated input and output pages. > 2. If you decide to allocate separate pages, make sure they are > visible to the hypervisor in the confidential scenarios. I know > we're not talking SNP and TDX here just yet but it would be > a waste of time imho to build something here and scrape that > later. The issues with allocations are: > a) If allocating on-demand, we might fail the hypercall > because of OOM. That's certainly bad as the whole VM > will break down. > b) If allocating for the whole lifetime of the VM, > let us remember that we avoid using hypercalls > due to their runtime cost. We'll be keeping around > 2 pages per CPU for the few times we need them. > 3. Consider reading a byte from the user land buffers to make the page > fault happen outside of disabling interrupts. There is no > outswap (maybe could have disabling swap in Kconfig) so the page > will stay in the memory. > > If you're not changing this, feel free to keep my "Reviewed-by". Lets discuss and reach an agreement, while still having your Reviewed-by 😊 - Saurabh