Hi Paul,

On 3/2/21 12:42 AM, Paul Boddie wrote:
> On Monday, 1 March 2021 21:30:11 CET Philipp Eppelt wrote:
>> On 2/24/21 10:10 PM, Paul Boddie wrote:
[...]
>>>> However, if there are other regions attached, e.g. (a2, s2) -> (d1, o2),
>>>> this will still remain and as soon as you unmap the d1-capability, you
>>>> have stale entries in your region map.
>>>
>>> What happens when a task tries to access the memory within a2 to a2+s2?
>>> Are there virtual memory associations that may still provide access to the
>>> memory exported by the now-unmapped capability?
>>
>> This I actually don't know.  I'll investigate. I hope the mappings are
>> gone and you'll get a page fault, though.
> 
> So do I. :-)
> 
> [Strange behaviour]
This was actually wrong. So assume you get a DS capability from some
other task. Then your use of the DS cap to get mappings from the
dataspace - either through rm->attach() and page faults  or direct
ds->map() calls. As a result you have several mappings in your task.

Then you unmap the DS cap from your object space. And ... nothing
happens or does it? You might still have access to the memory mappings
or you might not.
You didn't unmap the memory from your address space, but someone else,
the dataspace provider might destroy the DS in it's address space and
unmap the corresponding memory, leading to removal of the memory in all
other tasks (aka remove the branch from the mapping tree).

Applied to our example above:
* l4re_rm_detach(a1): (a1, s1) -> (d1, o1) is gone.
* free_um(d1)
* the region map still contains (a2, s2) -> (d1, o2): page faults will
fail, but if the memory was already mapped, it might be still there.

> 
>>> I also saw it with a region that overlapped the old one instead of having
>>> precisely the same base address:
>>>
>>> (a1+0x1000, s2) -> d2 -> mem[o2:o2+s2]
>>>
>>> Here, an access to the new base of a1+0x1000 appeared to expose
>>> mem[o1+0x1000] instead of mem[o2].
>>
>> Are you certain that d1 and d2 are actually different dataspaces? Are
>> you getting only d1 data or only d2 data? Are you getting a mix of d1
>> and d2 data?
> 
> It is, of course, always possible that I have been making a mistake - this 
> being the usual discovery when I report strange behaviour - but the means of 
> acquiring dataspaces d1 and d2 may involve distinct objects, and it involves 
> creating further distinct objects to act as dataspaces. So, something like 
> this would occur:
> 
> d1 = c1.open()
> d2 = c2.open()
> 
> Here, c1 and c2 may even be the same object, but even then they should still 
> allocate a new object for each invocation of the open operation, yielding two 
> distinct dataspaces d1 and d2.
> 
> What I would observe is d1 data even after d2 was attached. I was somewhat 
> confused as to whether d1 might still be active or not. But if it is, then d2 
> should not be allocated an address region coinciding with that of d1. If it 
> isn't, then d2 should be unaffected by whatever d1 had been doing.
> 
>> Let me summarize the steps I think are necessary during the lifetime of
>> the dataspace:
>> * Allocate a capability index for the dataspace
>> * Allocate the memory and receive the dataspace capability in the
>> allocated index (see
>> http://l4re.org/doc/classL4Re_1_1Mem__alloc.html#a44b301573ae859e8406400338c
>> c8e924) or something alike to get the mapping for the dataspace capability
>> under the allocated capability index. (to be sure use:
>> http://l4re.org/doc/group__l4__task__api.html#ga829a1b5cb4d5dba33ffee57534a5
>> 05af)
> 
> Do I need to use the memory allocation interface if the dataspace is sending 
> flexpage items? I have previously used the l4re_ma functions (and possibly 
> C++ 
> equivalents) to allocate memory, but this was mostly useful for device 
> drivers 
> where physical addresses may need to be obtained for hardware peripheral 
> usage, plus convenient sharing of entire memory regions between tasks without 
> any of my tasks needing to act as dataspaces.
> 
> My strategy with this work is to implement paging by sending flexpage items 
> to 
> satisfy paging requests and thus provide a dataspace implementation. In the 
> dataspace itself, I actually use posix_memalign to obtain memory, but that is 
> ultimately going to be using l4re_ma functions at the lowest level, I imagine.
No, I used Mem_alloc as an example on how to obtain an actual capability
behind your allocated index. If you get the capability mapping by other
means, this is fine.


>> Hopefully, this helps you as a baseline. I'm a bit puzzled by the
>> mem[o1+0x1000] case. I went through the code and I don't see how this
>> can happen unless the "task" capability given to l4re_rm_detach_unmap is
>> invalid, however, l4re_rm_detach is using the correct capability.
>> Which code version are you working on? Maybe I'm looking at the wrong code?
> 
> I'm still using the Subversion distribution (version 83) of L4Re. I know I 
> should be following the different GitHub repositories but I find the 
> Subversion distribution more convenient and I have not wanted to introduce 
> too 
> many different variables in my own experiments. Plus, it seems to be reliable 
> enough for my needs.
No worries, SVN is fine.

> 
> Over the weekend, I tried to troubleshoot this issue and investigate the 
> nature of it. I then retraced my steps, introducing wrapper functions around 
> l4re_rm_attach and l4re_rm_detach to see if the region manager was giving out 
> duplicate addresses. This seemed to indicate that it was indeed doing so. If 
> I 
> introduced synchronisation around the l4re_rm calls (effectively extending 
> the 
> synchronisation already in place around the STL data structure recording 
> active regions), the observed problem went away.
> 
> Now, this is not consistent with what Christian wrote a few weeks ago, where 
> he also noted that the capability slot allocator is not thread-safe, but I 
> imagine that either my own code somehow uses the region manager API in a 
> thread-unsafe way (although I cannot see exactly how that might be) or there 
> is some element of using this API where a degree of "thread unsafety" exists. 
> So, I have just added synchronisation around both the capability slot 
> allocator and the region manager operations.

Thread safety again. Nothing springs to mind, but this is certainly
interesting. I'll mull over it a bit.

Cheers
Philipp

-- 
philipp.epp...@kernkonzept.com - Tel. 0351-41 883 221
http://www.kernkonzept.com

Kernkonzept GmbH.  Sitz: Dresden.  Amtsgericht Dresden, HRB 31129.
Geschäftsführer: Dr.-Ing. Michael Hohmuth

_______________________________________________
l4-hackers mailing list
l4-hackers@os.inf.tu-dresden.de
http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

Reply via email to