> Yes, we'll get a clobbered value, but we'll get a _valid_ clobbered value,
> and we'll just end up doing the fixups twice (and returning to the user
> process that didn't get the page it wanted, which will end up re-doing the
> page fault).

I dont see that we will get a valid value in both cases.

        get_user
                fault - set %cr2
                                        IRQ
                                        vmalloc
                                        fault
                                                set %cr2
                                                fixup runs
                                        end IRQ
                cr2 is corrupt

At this point the Linus tree suffers from one problem because it gets parallel
fixups wrong, and the -ac tree suffers from a different problem because
it gets parallel fixups right and to handle that case wont allow a vmalloc
fixup on a fault from userspace (or you get infinite loops)

> [ Looks closer.. ]
> 
> Actually, the second time we'd do the fixup we'd be unhappy, because it
> has already been done. That test should probably be removed. Hmm.

There are a whole set of races with the vmalloc fixups. Most are I think fixed
in the -ac bits if you want a look (I've not submitted them as they are not
too pretty). What I don't currently see is how you handle this without
looping forever or getting the SMP race fixup wrong.

(The current -ac fix for the double vmalloc races is below. WP test makes it
more complex than is nice)

--- /usr/src/LINUS/LINUX2.4/linux.245p1/arch/i386/mm/fault.c    Wed May  2 13:52:04 
2001
+++ /usr/src/LINUS/LINUX2.4/linux.ac/arch/i386/mm/fault.c       Fri May  4 15:03:45 
+2001
@@ -127,8 +183,11 @@
         * be in an interrupt or a critical region, and should
         * only copy the information from the master page table,
         * nothing more.
+        *
+        * Handle kernel vmalloc fill in faults. User space doesn't take
+        * this path. It isnt permitted to go poking up there.
         */
-       if (address >= TASK_SIZE)
+       if (address >= TASK_SIZE && !(error_code & 4))
                goto vmalloc_fault;
 
        mm = tsk->mm;
@@ -325,7 +378,11 @@
                 *
                 * Do _not_ use "tsk" here. We might be inside
                 * an interrupt in the middle of a task switch..
+                *
+                * Note. There is 1 gotcha here. We may be doing the WP
+                * test. If so then fixing the pgd/pmd won't help.
                 */
+                
                int offset = __pgd_offset(address);
                pgd_t *pgd, *pgd_k;
                pmd_t *pmd, *pmd_k;
@@ -344,7 +401,29 @@
                pmd = pmd_offset(pgd, address);
                pmd_k = pmd_offset(pgd_k, address);
 
-               if (pmd_present(*pmd) || !pmd_present(*pmd_k))
+               /* If the pmd is present then we either have two cpus trying
+                * to fill in the vmalloc entries at once, or we have an
+                * exception. We can treat the collision as a slow path without
+                * worry. Its incredibly incredibly rare
+                *
+                * If the pte is read only then we know its a fault and we must
+                * exception or Oops as it would loop forever otherwise
+                */
+                
+               if (pmd_present(*pmd))
+               {
+                       pte_t *ptep = pte_offset(pmd, address);
+                       if ((error_code & 2) && !pte_write(*ptep))
+                       {
+                               if ((fixup = search_exception_table(regs->eip)) != 0) {
+                                       regs->eip = fixup;
+                                       return;
+                               }
+                               goto bad_area_nosemaphore;
+                       }
+                       /* SMP race.. continue */
+               }
+               if (!pmd_present(*pmd_k))
                        goto bad_area_nosemaphore;
                set_pmd(pmd, *pmd_k);
                return;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to