Hello Dennis, On 03/27/2017 04:14 PM, Denis Huber wrote: > Dear Genode community, > > Preliminary: We implemented a Checkpoint/Restore mechanism on basis of > Genode/Fiasco.OC (Thanks to the great help of you all). We store the > state of the target component by monitoring its RPC function calls which > go through the parent component (= our Checkpoint/Restore component). > The capability space is indirectly checkpointed through the capability map. > The restoring of the state of the target is done by restoring the RPC > objects used by the target component (e.g. PD session, dataspaces, > region maps, etc.). The capabilities of the restored objects have to be > also restored in the capability space (kernel) and in the capability map > (userspace). > > For restoring the target component Norman suggested the usage of the > Genode::Child constructor with an invalid ROM dataspace capability which > does not trigger the bootstrap mechanism. Thus, we have the full control > of inserting the capabilities of the restored RPC objects into the > capability space/map. > > Our problem is the following: We restore the RPC objects and insert them > into the capability map and then in the capability space. From the > kernel point of view these capabilities are all "IPC Gates". > Unfortunately, there was also an IRQ kernel object created by the > bootstrap mechanism. The following table shows the kernel debugger > output of the capability space of the freshly bootstraped target component: > > 000204 :0016e* Gate 0015f* Gate 00158* Gate 00152* Gate > 000208 :00154* Gate 0017e* Gate 0017f* Gate 00179* Gate > 00020c :00180* Gate 00188* Gate -- -- > 000210 : -- -- 0018a* Gate 0018c* Gate > 000214 :0018e* Gate 00196* Gate 00145* Gate 00144* IRQ > 000218 :00198* Gate -- -- -- > 00021c : -- 0019c* Gate -- -- > > At address 000217 you can see the IRQ kernel object. What does this > object do, how can we store/monitor it, and how can it be restored? > Where can we find the source code which creates this object in Genode's > bootstrap code?
The IRQ kernel object you refer to is used by the "signal_handler" thread to block for signals of core's corresponding service. It is a base-foc specific internal core RPC object[1] that is used by the signal handler[2] and the related capability gets returned by the call to 'alloc_signal_source()' provided by the PD session[3]. I have to admit, I did not follow your current implementation approach in depth. Thereby, I do not know how to exactly handle this specific signal hander thread and its semaphore-like IRQ object, but maybe the references already help you further. Regards Stefan [1] repos/base-foc/src/core/signal_source_component.cc [2] repos/base-foc/src/lib/base/signal_source_client.cc [3] repos/base/src/core/include/pd_session_component.h > > > Best regards, > Denis > > On 11.12.2016 13:01, Denis Huber wrote: >> Hello Norman, >> >>> What you observe here is the ELF loading of the child's binary. As part >>> of the 'Child' object, the so-called '_process' member is constructed. >>> You can find the corresponding code at >>> 'base/src/lib/base/child_process.cc'. The code parses the ELF executable >>> and loads the program segments, specifically the read-only text segment >>> and the read-writable data/bss segment. For the latter, a RAM dataspace >>> is allocated and filled with the content of the ELF binary's data. In >>> your case, when resuming, this procedure is wrong. After all, you want >>> to supply the checkpointed data to the new child, not the initial data >>> provided by the ELF binary. >>> >>> Fortunately, I encountered the same problem when implementing fork for >>> noux. I solved it by letting the 'Child_process' constructor accept an >>> invalid dataspace capability as ELF argument. This has two effects: >>> First, the ELF loading is skipped (obviously - there is no ELF to load). >>> And second the creation of the initial thread is skipped as well. >>> >>> In short, by supplying an invalid dataspace capability as binary for the >>> new child, you avoid all those unwanted operations. The new child will >>> not start at 'Component::construct'. You will have to manually create >>> and start the threads of the new child via the PD and CPU session >>> interfaces. >> >> Thank you for the hint. I will try out your approach >> >>> The approach looks good. I presume that you encounter base-foc-specific >>> peculiarities of the thread-creation procedure. I would try to follow >>> the code in 'base-foc/src/core/platform_thread.cc' to see what the >>> interaction of core with the kernel looks like. The order of operations >>> might be important. >>> >>> One remaining problem may be that - even though you may by able the >>> restore most part of the thread state - the kernel-internal state cannot >>> be captured. E.g., think of a thread that was blocking in the kernel via >>> 'l4_ipc_reply_and_wait' when checkpointed. When resumed, the new thread >>> can naturally not be in this blocking state because the kernel's state >>> is not part of the checkpointed state. The new thread would possibly >>> start its execution at the instruction pointer of the syscall and issue >>> system call again, but I am not sure what really happens in practice. >> >> Is there a way to avoid this situation? Can I postpone the checkpoint by >> letting the entrypoint thread finish the intercepted RPC function call, >> then increment the ip of child's thread to the next command? >> >>> I think that you don't need the LOG-session quirk if you follow my >>> suggestion to skip the ELF loading for the restored component >>> altogether. Could you give it a try? >> >> You are right, the LOG-session quirk seems a bit clumsy. I like your >> idea of skipping the ELF loading and automated creation of CPU threads >> more, because it gives me the control to create and start the threads >> from the stored ip and sp. >> >> >> Best regards, >> Denis >> >> ------------------------------------------------------------------------------ >> Developer Access Program for Intel Xeon Phi Processors >> Access to Intel Xeon Phi processor-based developer platforms. >> With one year of Intel Parallel Studio XE. >> Training and support from Colfax. >> Order your platform today.http://sdm.link/xeonphi >> _______________________________________________ >> genode-main mailing list >> genode-main@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/genode-main >> > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > genode-main mailing list > genode-main@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/genode-main > -- Stefan Kalkowski Genode Labs https://github.com/skalk ยท http://genode.org/ ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main