Hi -
On 2016-02-10 at 17:09 'Kanoj Sarcar' via Akaros wrote:
> Port over linux 4.1.15 drivers/infiniband/core logic essential for
> kernel bypass NIC access. Slight edits to adapt to Akaros environment
> (#if exclusion of non essential code blocks, panic stubs etc),
> described in README file.
>
> Most of the interlock logic with core kernel (mm/vfs etc) is captured
> in compat.[ch].
For the most part, I turn a blind eye to any of the Linux driver
stuff. It's a miracle it works, and hopefully it won't break with
future changes to Akaros (or the driver!).
That being said, I'd like to avoid any potential issues if we can spot
them early:
> diff --git a/kern/drivers/net/udrvr/compat.c
> +/*
> + * Our version knocked off from kern/src/mm.c version + uncaching
> logic from
> + * vmap_pmem_nocache().
> + */
> +int map_upage_at_addr(struct proc *p, physaddr_t paddr, uintptr_t
> addr, int pteprot, int dolock) +{
> + pte_t pte;
> + int rv = -1;
> +
> + spin_lock(&p->pte_lock);
> +
> + pte = pgdir_walk(p->env_pgdir, (void*)addr, TRUE);
> +
> + if (!pte_walk_okay(pte))
> + goto err1;
> + pte_write(pte, paddr, pteprot);
> + // tlbflush(); tlb_flush_global();
> + rv = 0;
> +err1:
> + spin_unlock(&p->pte_lock);
> +
> + /*
> + * TODO: @mm tear down,
> unmap_and_destroy_vmrs():__vmr_free_pgs()
> + * decrefs page, which is a problem. 1st level workaround is
> to set
> + * PG_LOCKED/PG_PAGEMAP to avoid that. Not proud of myself.
> + */
> + if ((rv == 0) && (dolock == 1))
> + atomic_set(&pa2page(paddr)->pg_flags, PG_LOCKED |
> PG_PAGEMAP); +
> + return rv;
> +}
This is pretty brutal. I don't completely follow the issue requiring
PG_LOCKED/PG_PAGEMAP. An incref won't work?
Also, I think some of the guts of this function can be replaced with
page_insert(). It manages the incref for you too. You'll still need
to grab the PTE lock.
You still probably need a tlbflush. (not global though. global is for
a change to the kernel mapping, more specifically to any PTE with
PTE_G). I had a brutal bug that I tracked down once to a "TODO:
tlbflush", so I recommend doing it. =) In this case, you can use
tlb_invalidate(), which works on a single page.
Ultimately, I get that there are mismatches between what we provide and
what the driver wants from us. I'm okay with having a hack in
compat.h, but as an alternative, we could try to build the right
functions in Akaros too.
> +/*
> + * get_user_pages() does not grab a page ref count. Thus, put_page()
> + * can not release page ref count.
> + */
> +void put_page(struct page *pagep)
> +{
> + /* page_decref(pagep) / __put_page(pagep) */
> +}
We can probably decref, since our get_user_page shim should probably
incref. (more below).
> +int get_user_page(struct proc *p, unsigned long uvastart, int write,
> int force,
> + struct page **plist)
> +{
> + pte_t pte;
> + int ret = -1;
> +
> + spin_lock(&p->pte_lock);
> +
> + pte = pgdir_walk(p->env_pgdir, (void*)uvastart, TRUE);
> +
> + if (!pte_walk_okay(pte))
> + goto err1;
> +
> + if (!pte_is_present(pte)) {
> + printk("[akaros]: get_user_page() uva=0x%llx pte
> absent\n",
> + uvastart);
> + goto err1;
> + }
> +
> + if (write && (!pte_has_perm_urw(pte))) {
> + /* TODO: How is Linux using the "force" parameter */
> + printk("[akaros]: get_user_page() uva=0x%llx pte
> ro\n",
> + uvastart);
> + goto err1;
> + }
> +
> + plist[0] = pa2page(pte_get_paddr(pte));
> + ret = 1;
> +err1:
> + spin_unlock(&p->pte_lock);
> + return ret;
> +}
We might be able to implement this with page_lookup. You'll want to
lock the pte_lock and probably incref the result. Even if you don't use
page_lookup, I think you want to incref the result. I'm not sure about
Linux's rules for struct page management. I think they don't use
refcnts.
(http://lxr.free-electrons.com/source/include/linux/mm_types.h#L44)
One issue here is that Akaros's memory functions are split across a few
files. pmap (pmap.h, pmap.c, pmap64.c) handles mucking with the actual
address space. mm (mm.h, mm.c) handles the VMRs/VMAs and userspace
accessors (mmap, munmap, page faults etc).
That might not be the best organization, and it's probably largely
historical (pmap was there from the beginning, mm was added over time).
Anyway, I'm willing to merge this as is if I'm wrong with these
suggestions or it's a huge pain. Let me know. =) Also, I'll hold off
on the next patch til we get this one sorted.
Thanks,
Barret
--
You received this message because you are subscribed to the Google Groups
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.