On Wed, Nov 11, 2015 at 17:52:22 +0100, Jakub Jelinek wrote: > On Mon, Oct 19, 2015 at 10:47:54PM +0300, Ilya Verbin wrote: > > So, here is what I have for now. Attached target-29.c testcase works fine > > with > > MIC emul, however I don't know how to (and where) properly check for > > completion > > of async execution on target. And, similarly, where to do unmapping after > > that? > > Do we need a callback from plugin to libgomp (as far as I understood, PTX > > runtime supports this, but HSA doesn't), or libgomp will just check for > > ttask->is_completed in task.c? > > Here is the patch updated to have a task.c defined function that the plugin > can call upon completion of async offloading exection.
Thanks. > The testsuite coverage will need to improve, the testcase is wrong > (contains data races - if you want to test parallel running of two target > regions that both touch the same var, I'd say best would be to use > #pragma omp atomic and or in 4 in one case and 1 in another case, then > test if result is 5 (and similarly for the other var). > Also, with the usleeps Alex Monakov will be unhappy because PTX newlib does > not have it, but we'll need to find some solution for that. > > Another thing to work on beyond testsuite coverage (it is desirable to test > nowait target tasks (both depend and without depend) being awaited in all > the various waiting spots, i.e. end of parallel, barrier, taskwait, end of > taskgroup, or if (0) task with depend clause waiting on that. > > Also, I wonder what to do if #pragma omp target nowait is used outside of > (host) parallel - when team is NULL. All the tasking code in that case just > executes tasks undeferred, which is fine for all but target nowait - there > it is I'd say useful to be able to run a single host thread concurrently > with some async offloading tasks. So, I wonder if in that case, > if we encounter target nowait with team == NULL, should not just create a > dummy non-active (nthreads == 1) team, as if there was #pragma omp parallel > if (0) starting above it and ending at program's end. In OpenMP, the > program's initial thread is implicitly surrounded by inactive parallel, so > this isn't anything against the OpenMP execution model. But we'd need to > free the team somewhere in a destructor. > > Can you please try to cleanup the liboffloadmic side of this, so that > a callback instead of hardcoded __gomp_offload_intelmic_async_completed call > is used? Do you mean something like the patch bellow? I'll discuss it with liboffloadmic maintainers. > Can you make sure it works on XeonPhi non-emulated too? I'm trying to do it, but it will take some time... Unfortunately, target-32.c fails for me using emulation mode: Program received signal SIGSEGV, Segmentation fault. #0 0x00007ff4ab1265ed in priority_list_remove (list=0x0, node=0x7ff49001afa0, model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:422 #1 0x00007ff4ab1266d9 in priority_tree_remove (type=PQ_CHILDREN, head=0x1883138, node=0x7ff49001afa0) at libgomp/priority_queue.c:195 #2 0x00007ff4ab10fa06 in priority_queue_remove (type=PQ_CHILDREN, head=0x1883138, task=0x7ff49001af30, model=MEMMODEL_RELAXED) at libgomp/priority_queue.h:468 #3 0x00007ff4ab11570d in gomp_task_maybe_wait_for_dependencies (depend=0x7ff49b0d9de0) at libgomp/task.c:1539 #4 0x00007ff4ab11fd46 in GOMP_target_enter_exit_data (device=-1, mapnum=3, hostaddrs=0x7ff49b0d9dc0, sizes=0x6020b0 <.omp_data_sizes.38>, kinds=0x6020a0 <.omp_data_kinds.39>, flags=2, depend=0x7ff49b0d9de0) at libgomp/target.c:1662 #5 0x00000000004011f9 in main._omp_fn () #6 0x00007ff4ab1160f3 in gomp_thread_start (xdata=0x7fffe93766a0) at libgomp/team.c:119 #7 0x0000003b07e07ee5 in start_thread () from /lib64/libpthread.so.0 #8 0x0000003b076f4b8d in clone () from /lib64/libc.so.6 However when I manually run commands from testsuite/libgomp.log under the same environment, it passes. Don't know where is the difference. Also I tried to replace 'b = 4;' and 'b = 5;' with infinite loops, but got only 100% CPU usage in offload_target_main instead of 200%, so it seems that only one target task is running concurrently. diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp index 6da09b1..772e198 100644 --- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp +++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp @@ -220,6 +220,10 @@ static void register_main_image () { __offload_register_image (&main_target_image); + + /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when + asynchronous task on target is completed. */ + __offload_register_task_callback (GOMP_PLUGIN_target_task_completion); } /* liboffloadmic loads and runs offload_target_main on all available devices @@ -537,13 +541,3 @@ GOMP_OFFLOAD_run (int device, void *tgt_fn, void *tgt_vars) GOMP_OFFLOAD_async_run (device, tgt_fn, tgt_vars, NULL); } - -/* Called by liboffloadmic when asynchronous function is completed. */ - -extern "C" void -__gomp_offload_intelmic_async_completed (const void *async_data) -{ - TRACE ("(async_data = %p)", async_data); - - GOMP_PLUGIN_target_task_completion ((void *) async_data); -} diff --git a/liboffloadmic/runtime/offload_host.cpp b/liboffloadmic/runtime/offload_host.cpp index a150410..e52019d 100644 --- a/liboffloadmic/runtime/offload_host.cpp +++ b/liboffloadmic/runtime/offload_host.cpp @@ -64,8 +64,7 @@ static void __offload_fini_library(void); #define GET_OFFLOAD_NUMBER(timer_data) \ timer_data? timer_data->offload_number : 0 -extern "C" void -__gomp_offload_intelmic_async_completed (const void *); +static void (*task_completion_callback)(void *); extern "C" { #ifdef TARGET_WINNT @@ -2510,7 +2509,7 @@ extern "C" { const void *info ) { - __gomp_offload_intelmic_async_completed (info); + task_completion_callback ((void *) info); } } @@ -5672,6 +5671,11 @@ extern "C" void __offload_unregister_image(const void *target_image) } } +extern "C" void __offload_register_task_callback(void (*cb)(void *)) +{ + task_completion_callback = cb; +} + // Runtime trace interface for user programs void __offload_console_trace(int level) diff --git a/liboffloadmic/runtime/offload_host.h b/liboffloadmic/runtime/offload_host.h index afd5c99..2a43fd6 100644 --- a/liboffloadmic/runtime/offload_host.h +++ b/liboffloadmic/runtime/offload_host.h @@ -376,6 +376,9 @@ extern "C" bool __offload_target_image_is_executable(const void *target_image); extern "C" bool __offload_register_image(const void* image); extern "C" void __offload_unregister_image(const void* image); +// Registers asynchronous task completion callback +extern "C" void __offload_register_task_callback(void (*cb)(void *)); + // Initializes offload runtime library. DLL_LOCAL extern int __offload_init_library(void); -- Ilya