Hi On Sun, Jul 13, 2014 at 1:30 PM, Tom Gundersen <t...@jklm.no> wrote: > Couple of random nitpicks below. > > On Sun, Jul 13, 2014 at 12:37 PM, David Herrmann <dh.herrm...@gmail.com> > wrote: >> The "Barrier" object is a simple inter-process barrier implementation. It >> allows placing synchronization points and waiting for the other side to >> reach it. Additionally, it has an abortion-mechanism as second-layer >> synchronization to send abortion-events asynchronously to the other side. >> >> The API is usually used to synchronize processes during fork(). However, >> it can be extended to pass state through execve() so you could synchronize >> beyond execve(). >> >> Usually, it's used like this (error-handling replaced by assert() for >> simplicity): >> >> Barrier b; >> >> r = barrier_init(&b); >> assert_se(r >= 0); >> >> pid = fork(); >> assert_se(pid >= 0); >> if (pid == 0) { >> barrier_set_role(&b, BARRIER_CHILD); >> >> ...do child post-setup... >> if (CHILD_SETUP_FAILED) >> exit(1); >> ...child setup done... >> >> barrier_place(&b); >> if (!barrier_sync(&b)) { >> /* parent setup failed */ >> exit(1); >> } >> >> barrier_destroy(&b); /* redundant as execve() and exit() imply >> this */ >> >> /* parent & child setup successful */ >> execve(...); >> } >> >> barrier_set_role(&b, BARRIER_PARENT); >> >> ...do parent post-setup... >> if (PARENT_SETUP_FAILED) { >> barrier_abort(&b); /* send abortion event */ >> barrier_wait_abortion(&b); /* wait for child to abort (exit() >> implies abortion) */ >> barrier_destroy(&b); >> ...bail out... >> } >> ...parent setup done... >> >> barrier_place(&b); >> if (!barrier_sync(&b)) { >> ...child setup failed... ; >> barrier_destroy(&b); >> ...bail out... >> } >> >> barrier_destroy(&b); >> >> ...child setup successfull... >> >> This is the most basic API. Using barrier_place() to place barriers and >> barrier_sync() to perform a full synchronization between both processes. >> barrier_abort() places an abortion barrier which superceeds any other >> barriers, exit() (or barrier_destroy()) places an abortion-barrier that >> queues behind existing barriers (thus *not* replacing existing barriers >> unlike barrier_abort()). >> >> This example uses hard-synchronization with wait_abortion(), sync() and >> friends. These are all optional. Barriers are highly dynamic and can be >> used for one-way synchronization or even no synchronization at all >> (postponing it for later). The sync() call performs a full two-way >> synchronization. >> >> The API is documented and should be fairly self-explanatory. A test-suite >> shows some special semantics regarding abortion, wait_next() and exit(). >> >> Internally, barriers use two eventfds and a pipe. The pipe is used to >> detect exit()s of the remote side as eventfds do not allow that. The >> eventfds are used to place barriers, one for each side. Barriers itself >> are numbered, but the numbers are reused once both sides reached the same >> barrier, thus you cannot address barriers by the index. Moreover, the >> numbering is implicit and we only store a counter. This makes the >> implementation itself very lightweight, which is probably negligible >> considering that we need 3 FDs for a barrier.. >> >> Last but not least: This barrier implementation is quite heavy. It's >> definitely not meant for fast IPC synchronization. However, it's very easy >> to use. And given the *HUGE* overhead of fork(), the barrier-overhead >> should be negligible. >> --- >> .gitignore | 1 + >> Makefile.am | 9 + >> src/shared/barrier.c | 442 ++++++++++++++++++++++++++++++++++++++++++++++ >> src/shared/barrier.h | 97 ++++++++++ >> src/test/test-barrier.c | 460 >> ++++++++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 1009 insertions(+) >> create mode 100644 src/shared/barrier.c >> create mode 100644 src/shared/barrier.h >> create mode 100644 src/test/test-barrier.c >> >> diff --git a/.gitignore b/.gitignore >> index 31cd8f8..5289f0e 100644 >> --- a/.gitignore >> +++ b/.gitignore >> @@ -123,6 +123,7 @@ >> /tags >> /test-architecture >> /test-async >> +/test-barrier >> /test-boot-timestamp >> /test-bus-chat >> /test-bus-cleanup >> diff --git a/Makefile.am b/Makefile.am >> index 2b1484f..039a83e 100644 >> --- a/Makefile.am >> +++ b/Makefile.am >> @@ -828,6 +828,8 @@ libsystemd_shared_la_SOURCES = \ >> src/shared/login-shared.h \ >> src/shared/ring.c \ >> src/shared/ring.h \ >> + src/shared/barrier.c \ >> + src/shared/barrier.h \ >> src/shared/async.c \ >> src/shared/async.h \ >> src/shared/eventfd-util.c \ >> @@ -1238,6 +1240,7 @@ tests += \ >> test-ellipsize \ >> test-util \ >> test-ring \ >> + test-barrier \ >> test-tmpfiles \ >> test-namespace \ >> test-date \ >> @@ -1408,6 +1411,12 @@ test_ring_SOURCES = \ >> test_ring_LDADD = \ >> libsystemd-core.la >> >> +test_barrier_SOURCES = \ >> + src/test/test-barrier.c >> + >> +test_barrier_LDADD = \ >> + libsystemd-core.la >> + >> test_tmpfiles_SOURCES = \ >> src/test/test-tmpfiles.c >> >> diff --git a/src/shared/barrier.c b/src/shared/barrier.c >> new file mode 100644 >> index 0000000..e7b4ead >> --- /dev/null >> +++ b/src/shared/barrier.c >> @@ -0,0 +1,442 @@ >> +/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/ >> + >> +/*** >> + This file is part of systemd. >> + >> + Copyright 2014 David Herrmann <dh.herrm...@gmail.com> >> + >> + systemd is free software; you can redistribute it and/or modify it >> + under the terms of the GNU Lesser General Public License as published by >> + the Free Software Foundation; either version 2.1 of the License, or >> + (at your option) any later version. >> + >> + systemd is distributed in the hope that it will be useful, but >> + WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public License >> + along with systemd; If not, see <http://www.gnu.org/licenses/>. >> +***/ >> + >> +#include <errno.h> >> +#include <fcntl.h> >> +#include <limits.h> >> +#include <poll.h> >> +#include <stdbool.h> >> +#include <stdint.h> >> +#include <stdio.h> >> +#include <stdlib.h> >> +#include <string.h> >> +#include <sys/eventfd.h> >> +#include <sys/types.h> >> +#include <unistd.h> >> + >> +#include "barrier.h" >> +#include "macro.h" >> +#include "util.h" >> + >> +/** >> + * Barriers >> + * This barrier implementation provides a simple synchronization method >> based >> + * on file-descriptors that can safely be used between threads and >> processes. A >> + * barrier object contains 2 shared counters based on eventfd. Both >> processes >> + * can now place barriers and wait for the other end to reach a random or >> + * specific barrier. >> + * Barriers are numbered, so you can either wait for the other end to reach >> any >> + * barrier or the last barrier that you placed. This way, you can use >> barriers >> + * for one-way *and* full synchronization. Note that even-though barriers >> are >> + * numbered, these numbers are internal and recycled once both sides >> reached the >> + * same barrier (implemented as a simple signed counter). It is thus not >> + * possible to address barriers by their ID. >> + * >> + * Barrier-API: Both ends can place as many barriers via barrier_place() as >> + * they want and each pair of barriers on both sides will be implicitly >> linked. >> + * Each side can use the barrier_wait/sync_*() family of calls to wait for >> the >> + * other side to place a specific barrier. barrier_wait_next() waits until >> the >> + * other side calls barrier_place(). No links between the barriers are >> + * considered and this simply serves as most basic asynchronous barrier. >> + * barrier_sync_next() is like barrier_wait_next() and waits for the other >> side >> + * to place their next barrier via barrier_place(). However, it only waits >> for >> + * barriers that are linked to a barrier we already placed. If the other >> side >> + * already placed more barriers than we did, barrier_sync_next() returns >> + * immediately. >> + * barrier_sync() extends barrier_sync_next() and waits until the other end >> + * placed as many barriers via barrier_place() as we did. If they already >> placed >> + * as many as we did (or more), it returns immediately. >> + * >> + * Additionally to basic barriers, an abortion event is available. >> + * barrier_abort() places an abortion event that cannot be undone. An >> abortion >> + * immediately cancels all placed barriers and replaces them. Any running >> and >> + * following wait/sync call besides barrier_wait_abortion() will immediately >> + * return false on both sides (otherwise, they always return true). >> + * barrier_abort() can be called multiple times on both ends and will be a >> + * no-op if already called on this side. >> + * barrier_wait_abortion() can be used to wait for the other side to call >> + * barrier_abort() and is the only wait/sync call that does not return >> + * immediately if we aborted outself. It only returns once the other side >> + * called barrier_abort(). >> + * >> + * Barriers can be used for in-process and inter-process synchronization. >> + * However, for in-process synchronization you could just use mutexes. >> + * Therefore, main target is IPC and we require both sides to *not* share >> the FD >> + * table. If that's given, barriers provide target tracking: If the remote >> side >> + * exit()s, an abortion event is implicitly queued on the other side. This >> way, >> + * a sync/wait call will be woken up if the remote side crashed or exited >> + * unexpectedly. However, note that these abortion events are only queued >> if the >> + * barrier-queue has been drained. Therefore, it is safe to place a barrier >> and >> + * exit. The other side can safely wait on the barrier even though the exit >> + * queued an abortion event. Usually, the abortion event would overwrite the >> + * barrier, however, that's not true for exit-abortion events. Those are >> only >> + * queued if the barrier-queue is drained (thus, the receiving side has >> placed >> + * more barriers than the remote side). >> + */ >> + >> +/** >> + * barrier_init() - Initialize a barrier object >> + * @obj: barrier to initialize >> + * >> + * This initializes a barrier object. The caller is responsible of >> allocating >> + * the memory and keeping it valid. The memory does not have to be zeroed >> + * beforehand. >> + * Two eventfd objects are allocated for each barrier. If allocation fails, >> an >> + * error is returned. >> + * >> + * If this function fails, the barrier is reset to an invalid state so it is >> + * safe to call barrier_destroy() on the object regardless whether the >> + * initialization succeeded or not. >> + * >> + * The caller is responsible to destroy the object via barrier_destroy() >> before >> + * releasing the underlying memory. >> + * >> + * Returns: 0 on success, negative error code on failure. >> + */ >> +int barrier_init(Barrier *obj) { >> + _barrier_destroy_ptr_ Barrier *b = NULL; >> + int r; >> + >> + assert_return(obj, -EINVAL); >> + >> + b = obj; >> + zero(*b); >> + >> + b->me = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK); >> + if (b->me < 0) >> + return -errno; >> + >> + b->them = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK); >> + if (b->them < 0) >> + return -errno; >> + >> + r = pipe2(b->pipe, O_CLOEXEC | O_NONBLOCK); >> + if (r < 0) >> + return -errno; >> + >> + b = NULL; >> + return 0; > > Usually we don't mangle obj unless the call succeeds. You could just > assign the eventfds and pipe to local variables and then > zero+initialize obj once you know it all worked.
This was part of the API. It guarantees that you can run barrier_destroy() on the object regardless whether barrier_init() was successful or not. A simple memzero() doesn't work with barriers as they have embedded FDs. And given that barriers are not dynamically allocated (but embedded), I thought this was a handy feature. Anyhow, I changed barrier_destroy() to do nothing if all is 0 (b->me can never be the same as b->them, so it's invalid state). Therefore, a memzero() now works just fine and there's no need to keep that behavior. I can change this to: Barrier b = { }; ...init... memcpy(obj, &b, sizeof(*obj)); return 0; >> +} >> + >> +/** >> + * barrier_destroy() - Destroy a barrier object >> + * @b: barrier to destroy or NULL >> + * >> + * This destroys a barrier object that has previously been initialized via >> + * barrier_destroy(). The object is released and reset to invalid state. > > barrier_init()? Whoops, nice catch. >> + * Therefore, it is safe to call barrier_destroy() multiple times or even if >> + * barrier_init() failed. However, you must not call barrier_destroy() if >> you >> + * never called barrier_init() on the object before. >> + * >> + * It is safe to initialize a barrier via zero() / memset(.., 0, ...). Even >> + * though it has embedded FDs, barrier_destroy() can deal with zeroed >> objects >> + * just fine. >> + * >> + * If @b is NULL, this is a no-op. >> + */ >> +void barrier_destroy(Barrier *b) { >> + if (!b) >> + return; >> + >> + /* @me and @them cannot be both FD 0. Lets be pedantic and check the >> + * pipes and barriers, too. If all are 0, the object was zero()ed >> and >> + * is invalid. This allows users to use zero(barrier) to reset the >> + * backing memory. */ >> + if (b->me == 0 && >> + b->them == 0 && >> + b->pipe[0] == 0 && >> + b->pipe[1] == 0 && >> + b->barriers == 0) >> + return; >> + >> + b->me = safe_close(b->me); >> + b->them = safe_close(b->them); >> + b->pipe[0] = safe_close(b->pipe[0]); >> + b->pipe[1] = safe_close(b->pipe[1]); >> + b->barriers = 0; >> +} >> + >> +/** >> + * barrier_set_role() - Set the local role of the barrier >> + * @b: barrier to operate on >> + * @role: role to set on the barrier >> + * >> + * This sets the roles on a barrier object. This is needed to know which >> + * side of the barrier you're on. Usually, the parent creates the barrier >> via >> + * barrier_init() and then calls fork() or clone(). Therefore, the FDs are >> + * duplicated and the child retains the same barrier object. >> + * >> + * Both sides need to call barrier_set_role() after fork() or clone() are >> done. >> + * If this is not done, barriers will not work correctly. >> + * >> + * Note that barriers could be supported without fork() or clone(). However, >> + * this is currently not needed so it hasn't been implemented. >> + */ >> +void barrier_set_role(Barrier *b, unsigned int role) { >> + int fd; >> + >> + assert(b); >> + assert(role == BARRIER_PARENT || role == BARRIER_CHILD); >> + /* make sure this is only called once */ >> + assert(b->pipe[1] >= 0 && b->pipe[1] >= 0); >> + >> + if (role == BARRIER_PARENT) { >> + b->pipe[1] = safe_close(b->pipe[1]); >> + } else { >> + b->pipe[0] = safe_close(b->pipe[0]); >> + >> + /* swap me/them for childs */ > > children Gnah, I will never get that right. Fixed. >> + fd = b->me; >> + b->me = b->them; >> + b->them = fd; >> + } >> +} >> + >> +/* places barrier; returns false if we aborted, otherwise true */ >> +static bool barrier_write(Barrier *b, uint64_t buf) { >> + ssize_t len; >> + >> + /* prevent new sync-points if we already aborted */ >> + if (barrier_i_aborted(b)) >> + return false; >> + >> + do { >> + len = write(b->me, &buf, sizeof(buf)); >> + } while (len < 0 && (errno == EAGAIN || errno == EINTR)); >> + >> + if (len != sizeof(buf)) >> + goto error; >> + >> + /* lock if we aborted */ >> + if (buf >= (uint64_t)BARRIER_ABORTION) { >> + if (barrier_they_aborted(b)) >> + b->barriers = BARRIER_WE_ABORTED; >> + else >> + b->barriers = BARRIER_I_ABORTED; >> + } else if (!barrier_is_aborted(b)) { >> + b->barriers += buf; >> + } >> + >> + return !barrier_i_aborted(b); >> + >> +error: >> + /* If there is an unexpected error, we have to make this fatal. >> There >> + * is no way we can recover from sync-errors. Therefore, we close >> the >> + * pipe-ends and treat this as abortion. The other end will notice >> the >> + * pipe-close and treat it as abortion, too. */ >> + >> + b->pipe[0] = safe_close(b->pipe[0]); >> + b->pipe[1] = safe_close(b->pipe[1]); >> + b->barriers = BARRIER_WE_ABORTED; >> + return false; >> +} >> + >> +/* waits for barriers; returns false if they aborted, otherwise true */ >> +static bool barrier_read(Barrier *b, int64_t comp) { >> + uint64_t buf; >> + ssize_t len; >> + struct pollfd pfd[2] = { }; >> + int r; >> + >> + if (barrier_they_aborted(b)) >> + return false; >> + >> + while (b->barriers > comp) { >> + pfd[0].fd = (b->pipe[0] >= 0) ? b->pipe[0] : b->pipe[1]; >> + pfd[0].events = POLLHUP; >> + pfd[0].revents = 0; >> + pfd[1].fd = b->them; >> + pfd[1].events = POLLIN; >> + pfd[1].revents = 0; >> + >> + r = poll(pfd, 2, -1); >> + if (r < 0 && (errno == EAGAIN || errno == EINTR)) >> + continue; >> + else if (r < 0) >> + goto error; >> + >> + if (pfd[1].revents) { >> + /* events on @them signal us new data */ >> + len = read(b->them, &buf, sizeof(buf)); >> + if (len < 0 && (errno == EAGAIN || errno == EINTR)) >> + continue; >> + >> + if (len != sizeof(buf)) >> + goto error; >> + } else if (pfd[0].revents & (POLLHUP | POLLERR | POLLNVAL)) >> { >> + /* POLLHUP on the pipe tells us the other side >> exited. >> + * We treat this as implicit abortion. But we only >> + * handle it if there's no event on the eventfd. >> This >> + * guarantees that exit-abortions do not overwrite >> real >> + * barriers. */ >> + buf = BARRIER_ABORTION; >> + } >> + >> + /* lock if they aborted */ >> + if (buf >= (uint64_t)BARRIER_ABORTION) { >> + if (barrier_i_aborted(b)) >> + b->barriers = BARRIER_WE_ABORTED; >> + else >> + b->barriers = BARRIER_THEY_ABORTED; >> + } else if (!barrier_is_aborted(b)) { >> + b->barriers -= buf; >> + } >> + } >> + >> + return !barrier_they_aborted(b); >> + >> +error: >> + /* If there is an unexpected error, we have to make this fatal. >> There >> + * is no way we can recover from sync-errors. Therefore, we close >> the >> + * pipe-ends and treat this as abortion. The other end will notice >> the >> + * pipe-close and treat it as abortion, too. */ >> + >> + b->pipe[0] = safe_close(b->pipe[0]); >> + b->pipe[1] = safe_close(b->pipe[1]); >> + b->barriers = BARRIER_WE_ABORTED; >> + return false; >> +} >> + >> +/** >> + * barrier_place() - Place a new barrier >> + * @b: barrier object >> + * >> + * This places a new barrier on the barrier object. If either side already >> + * aborted, this is a no-op and returns "false". Otherwise, the barrier is >> + * placed and this returns "true". >> + * >> + * Returns: true if barrier was placed, false if either side aborted. >> + */ >> +bool barrier_place(Barrier *b) { >> + assert(b); >> + >> + if (barrier_is_aborted(b)) >> + return false; >> + >> + barrier_write(b, BARRIER_SINGLE); >> + return true; >> +} >> + >> +/** >> + * barrier_abort() - Abort the synchronization >> + * @b: barrier object to abort >> + * >> + * This aborts the barrier-synchronization. If barrier_abort() was already >> + * called on this side, this is a no-op. Otherwise, the barrier is put into >> the >> + * ABORT-state and will stay there. The other side is notified about the >> + * abortion. Any following attempt to place normal barriers or to wait on >> normal >> + * barriers will return immediately as "false". >> + * >> + * You can wait for the other side to call barrier_abort(), too. Use >> + * barrier_wait_abortion() for that. >> + * >> + * Returns: false if the other side already aborted, true otherwise. >> + */ >> +bool barrier_abort(Barrier *b) { >> + assert(b); >> + >> + barrier_write(b, BARRIER_ABORTION); >> + return !barrier_they_aborted(b); >> +} >> + >> +/** >> + * barrier_wait_next() - Wait for the next barrier of the other side >> + * @b: barrier to operate on >> + * >> + * This waits until the other side places its next barrier. This is >> independent >> + * of any barrier-links and just waits for any next barrier of the other >> side. >> + * >> + * If either side aborted, this returns false. >> + * >> + * Returns: false if either side aborted, true otherwise. >> + */ >> +bool barrier_wait_next(Barrier *b) { >> + assert(b); >> + >> + if (barrier_is_aborted(b)) >> + return false; >> + >> + barrier_read(b, b->barriers - 1); >> + return !barrier_is_aborted(b); >> +} >> + >> +/** >> + * barrier_wait_abortion() - Wait for the other side to abort >> + * @b: barrier to operate on >> + * >> + * This waits until the other side called barrier_abort(). This can be >> called >> + * regardless whether the local side already called barrier_abort() or not. >> + * >> + * If the other side has already aborted, this returns immediately. >> + * >> + * Returns: false if the local side aborted, true otherwise. >> + */ >> +bool barrier_wait_abortion(Barrier *b) { >> + assert(b); >> + >> + barrier_read(b, BARRIER_THEY_ABORTED); >> + return !barrier_i_aborted(b); >> +} >> + >> +/** >> + * barrier_sync_next() - Wait for the other side to place a next linked >> barrier >> + * @b: barrier to operate on >> + * >> + * This is like barrier_wait_next() and waits for the other side to call >> + * barrier_place(). However, this only waits for linked barriers. That >> means, if >> + * the other side already placed more barriers than (or as much as) we did, >> this >> + * returns immediately instead of waiting. >> + * >> + * If either side aborted, this returns false. >> + * >> + * Returns: false if either side aborted, true otherwise. >> + */ >> +bool barrier_sync_next(Barrier *b) { >> + assert(b); >> + >> + if (barrier_is_aborted(b)) >> + return false; >> + >> + barrier_read(b, MAX((int64_t)0, b->barriers - 1)); >> + return !barrier_is_aborted(b); >> +} >> + >> +/** >> + * barrier_sync() - Wait for the other side to place as many barriers as we >> did >> + * @b: barrier to operate on >> + * >> + * This is like barrier_sync_next() but waits for the other side to call >> + * barrier_place() as often as we did (in total). If they already placed as >> much >> + * as we did (or more), this returns immediately instead of waiting. >> + * >> + * If either side aborted, this returns false. >> + * >> + * Returns: false if either side aborted, true otherwise. >> + */ >> +bool barrier_sync(Barrier *b) { >> + assert(b); >> + >> + if (barrier_is_aborted(b)) >> + return false; >> + >> + barrier_read(b, 0); >> + return !barrier_is_aborted(b); >> +} >> diff --git a/src/shared/barrier.h b/src/shared/barrier.h >> new file mode 100644 >> index 0000000..2d98e0d >> --- /dev/null >> +++ b/src/shared/barrier.h >> @@ -0,0 +1,97 @@ >> +/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/ >> + >> +#pragma once >> + >> +/*** >> + This file is part of systemd. >> + >> + Copyright 2014 David Herrmann <dh.herrm...@gmail.com> >> + >> + systemd is free software; you can redistribute it and/or modify it >> + under the terms of the GNU Lesser General Public License as published by >> + the Free Software Foundation; either version 2.1 of the License, or >> + (at your option) any later version. >> + >> + systemd is distributed in the hope that it will be useful, but >> + WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public License >> + along with systemd; If not, see <http://www.gnu.org/licenses/>. >> +***/ >> + >> +#include <errno.h> >> +#include <inttypes.h> >> +#include <stdlib.h> >> +#include <string.h> >> +#include <sys/types.h> >> + >> +#include "macro.h" >> +#include "util.h" >> + >> +/* See source file for an API description. */ >> + >> +typedef struct Barrier Barrier; >> + >> +enum { >> + BARRIER_SINGLE = 1LL, >> + BARRIER_ABORTION = INT64_MAX, >> + >> + /* bias values to store state; keep @WE < @THEY < @I */ >> + BARRIER_BIAS = INT64_MIN, >> + BARRIER_WE_ABORTED = BARRIER_BIAS + 1LL, >> + BARRIER_THEY_ABORTED = BARRIER_BIAS + 2LL, >> + BARRIER_I_ABORTED = BARRIER_BIAS + 3LL, >> +}; >> + >> +enum { >> + BARRIER_PARENT, >> + BARRIER_CHILD, >> +}; >> + >> +struct Barrier { >> + int me; >> + int them; > > Maybe use us/them (and not we/they) consistently everywhere? Hm. No idea how that should work. The member-names are dative case, the enums are nominative case. I doesn't make much sense to me to use the same for both. I use 1st person singular + 3rd person plural (me+them and I+they). The only exception is BARRIER_WE_ABORTED, but that is a combination of BARRIER_THEY_ABORTED and BARRIER_I_ABORTED (therefore, 'we'). I don't know how I could make it more consistent? Did you just mix up the BARRIER_WE_ABORTED constant? I could rename it to BARRIER_EVERYONE_ABORTED to avoid any confusion. >> + int pipe[2]; >> + int64_t barriers; >> +}; >> + >> +int barrier_init(Barrier *obj); >> +void barrier_destroy(Barrier *b); >> + >> +#define _barrier_destroy_ _cleanup_(barrier_destroy_on_stack) >> +static inline void barrier_destroy_on_stack(Barrier *b) { if (b) >> barrier_destroy(b); } >> +#define _barrier_destroy_ptr_ _cleanup_(barrier_destroyp) >> +DEFINE_TRIVIAL_CLEANUP_FUNC(Barrier*, barrier_destroy); >> + >> +void barrier_set_role(Barrier *b, unsigned int role); Regarding your idea to drop that function: I can add this to all the other functions: if (b->pid > 0) { barrier_set_role(b, b->pid != getpid()); b->pid = 0; } ..and then initialize b->pid to getpid(). This would cause any barrier_*() function to implicitly call barrier_set_role() and we could make the function an implementation detail. I'm still not sure I like this. Yeah, it does remove the explicit function-call, but at the same time it adds internal magic that is not really obvious from the outside. Imagine you have this code: int foobar(Barrier *b) { pid_t pid; foobar_init_A(b); pid = fork(); if (pid < 0) return -errno; if (pid > 0) { barrier_set_role(b, BARRIER_CHILD); foobar_init_B(b); if (!barrier_place_and_sync(b)) _exit(1); execve(...); } barrier_set_role(b, BARRIER_PARENT); foobar_init_C(b); if (!barrier_place_and_sync(b)) { waitpid(pid, NULL, 0); return -EIO; } return 0; } Now I'm not saying this is a beauty. But one nice thing about the barrier API is, any of the foobar_init_*() functions can call barrier_abort() on failure and just return. There's no need to forward the error-code (even though you could do that here). But even if the caller continues, the barrier_place_and_sync() call will always fail later on. Now if we add the magic getpid() handling, then we would have to make sure foobar_init_A() never uses the barrier. Easy to fix in this code, but might get ugly if the order of these foobar_init_*() calls is dynamic and each of them can be called pre-fork, as-child or as-parent. Yes, there might not be a reason to use the barrier API before forking, but there's also no reason to call free() on a NULL pointer. It's just a convenience API. And it's a safety net we can rely on. Last but not least: If we hard-code getpid(), we are incompatible to clone(CLONE_THREAD). If we hard-code gettid(), we might break clone(CLONE_THREAD | CLONE_FILES) setups where two threads access the barrier to sync against a fork()ed child. Ok, so far we only have real fork() use-cases, but I hate limiting the API. Long story short: I'm still not sure I like this shortcut.. But maybe that's just me being pedantic.. Thanks David >> + >> +bool barrier_place(Barrier *b); >> +bool barrier_abort(Barrier *b); >> + >> +bool barrier_wait_next(Barrier *b); >> +bool barrier_wait_abortion(Barrier *b); >> +bool barrier_sync_next(Barrier *b); >> +bool barrier_sync(Barrier *b); >> + >> +static inline bool barrier_i_aborted(Barrier *b) { >> + return b->barriers == BARRIER_I_ABORTED || b->barriers == >> BARRIER_WE_ABORTED; >> +} >> + >> +static inline bool barrier_they_aborted(Barrier *b) { >> + return b->barriers == BARRIER_THEY_ABORTED || b->barriers == >> BARRIER_WE_ABORTED; >> +} >> + >> +static inline bool barrier_we_aborted(Barrier *b) { >> + return b->barriers == BARRIER_WE_ABORTED; >> +} >> + >> +static inline bool barrier_is_aborted(Barrier *b) { >> + return b->barriers == BARRIER_I_ABORTED || b->barriers == >> BARRIER_THEY_ABORTED || b->barriers == BARRIER_WE_ABORTED; >> +} >> + >> +static inline bool barrier_place_and_sync(Barrier *b) { >> + barrier_place(b); >> + return barrier_sync(b); >> +} >> diff --git a/src/test/test-barrier.c b/src/test/test-barrier.c >> new file mode 100644 >> index 0000000..640e508 >> --- /dev/null >> +++ b/src/test/test-barrier.c >> @@ -0,0 +1,460 @@ >> +/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/ >> + >> +/*** >> + This file is part of systemd. >> + >> + Copyright 2014 David Herrmann <dh.herrm...@gmail.com> >> + >> + systemd is free software; you can redistribute it and/or modify it >> + under the terms of the GNU Lesser General Public License as published by >> + the Free Software Foundation; either version 2.1 of the License, or >> + (at your option) any later version. >> + >> + systemd is distributed in the hope that it will be useful, but >> + WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public License >> + along with systemd; If not, see <http://www.gnu.org/licenses/>. >> +***/ >> + >> +/* >> + * IPC barrier tests >> + * These tests verify the correct behavior of the IPC Barrier >> implementation. >> + * Note that the tests use alarm-timers to verify dead-locks and timeouts. >> These >> + * might not work on slow machines where 20ms are too short to perform >> specific >> + * operations (though, very unlikely). In case that turns out true, we have >> to >> + * increase it at the slightly cost of lengthen test-duration on other >> machines. >> + */ >> + >> +#include <errno.h> >> +#include <stdio.h> >> +#include <string.h> >> +#include <sys/time.h> >> +#include <sys/wait.h> >> +#include <unistd.h> >> + >> +#include "barrier.h" >> +#include "def.h" >> +#include "util.h" >> + >> +/* 20ms to test deadlocks; All timings use multiples of this constant as >> + * alarm/sleep timers. If this timeout is too small for slow machines to >> perform >> + * the requested operations, we have to increase it. On an i7 this works >> fine >> + * with 1ms base-time, so 20ms should be just fine for everyone. */ >> +#define BASE_TIME 20 >> + >> +static void malarm(unsigned long msecs) { >> + struct itimerval v = { }; >> + >> + timeval_store(&v.it_value, msecs * USEC_PER_MSEC); >> + assert_se(setitimer(ITIMER_REAL, &v, NULL) >= 0); >> +} >> + >> +static void msleep(unsigned long msecs) { >> + assert_se(msecs < MSEC_PER_SEC); >> + usleep(msecs * USEC_PER_MSEC); >> +} >> + >> +#define TEST_BARRIER(_FUNCTION, _CHILD_CODE, _WAIT_CHILD, _PARENT_CODE, >> _WAIT_PARENT) \ >> + static void _FUNCTION(void) { \ >> + Barrier b; \ >> + pid_t pid1, pid2; \ >> + \ >> + assert_se(barrier_init(&b) >= 0); \ >> + \ >> + pid1 = fork(); \ >> + assert_se(pid1 >= 0); \ >> + if (pid1 == 0) { \ >> + barrier_set_role(&b, BARRIER_CHILD); \ >> + { _CHILD_CODE; } \ >> + exit(42); \ >> + } \ >> + \ >> + pid2 = fork(); \ >> + assert_se(pid2 >= 0); \ >> + if (pid2 == 0) { \ >> + barrier_set_role(&b, BARRIER_PARENT); \ >> + { _PARENT_CODE; } \ >> + exit(42); \ >> + } \ >> + \ >> + barrier_destroy(&b); \ >> + malarm(999); \ >> + { _WAIT_CHILD; } \ >> + { _WAIT_PARENT; } \ >> + malarm(0); \ >> + } >> + >> +#define TEST_BARRIER_WAIT_SUCCESS(_pid) \ >> + ({ \ >> + int pidr, status; \ >> + pidr = waitpid(_pid, &status, 0); \ >> + assert_se(pidr == _pid); \ >> + assert_se(WIFEXITED(status)); \ >> + assert_se(WEXITSTATUS(status) == 42); \ >> + }) >> + >> +#define TEST_BARRIER_WAIT_ALARM(_pid) \ >> + ({ \ >> + int pidr, status; \ >> + pidr = waitpid(_pid, &status, 0); \ >> + assert_se(pidr == _pid); \ >> + assert_se(WIFSIGNALED(status)); \ >> + assert_se(WTERMSIG(status) == SIGALRM); \ >> + }) >> + >> +/* >> + * Test basic sync points >> + * This places a barrier in both processes and waits synchronously for them. >> + * The timeout makes sure the sync works as expected. The msleep() on one >> side >> + * makes sure the exit of the parent does not overwrite previous barriers. >> Due >> + * to the msleep(), we know that the parent already exited, thus there's a >> + * pending HUP on the pipe. However, the barrier_sync() prefers reads on the >> + * eventfd, thus we can safely wait on the barrier. >> + */ >> +TEST_BARRIER(test_barrier_sync, >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + msleep(BASE_TIME * 2); >> + assert_se(barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test wait_next() >> + * This places a barrier in the parent and syncs on it. The child sleeps >> while >> + * the parent places the barrier and then waits for a barrier. The wait will >> + * succeed as the child hasn't read the parent's barrier, yet. The following >> + * barrier and sync synchronize the exit. >> + */ >> +TEST_BARRIER(test_barrier_wait_next, >> + ({ >> + msleep(100); >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(400); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test wait_next() multiple times >> + * This places two barriers in the parent and waits for the child to exit. >> The >> + * child sleeps 20ms so both barriers _should_ be in place. It then waits >> for >> + * the parent to place the next barrier twice. The first call will fetch >> both >> + * barriers and return. However, the second call will stall as the parent >> does >> + * not place a 3rd barrier (the sleep caught two barriers). wait_next() is >> does >> + * not look at barrier-links so this stall is expected. Thus this test times >> + * out. >> + */ >> +TEST_BARRIER(test_barrier_wait_next_twice, >> + ({ >> + msleep(BASE_TIME); >> + malarm(BASE_TIME); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(0); >> + }), >> + TEST_BARRIER_WAIT_ALARM(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + msleep(BASE_TIME * 2); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test wait_next() with local barriers >> + * This is the same as test_barrier_wait_next_twice, but places local >> barriers >> + * between both waits. This does not have any effect on the wait so it >> times out >> + * like the other test. >> + */ >> +TEST_BARRIER(test_barrier_wait_next_twice_local, >> + ({ >> + msleep(BASE_TIME); >> + malarm(BASE_TIME); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(0); >> + }), >> + TEST_BARRIER_WAIT_ALARM(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + msleep(BASE_TIME * 2); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test wait_next() with sync_next() >> + * This is again the same as test_barrier_wait_next_twice but uses a >> + * synced wait as the second wait. This works just fine because the local >> state >> + * has no barriers placed, therefore, the remote is always in sync. >> + */ >> +TEST_BARRIER(test_barrier_wait_next_twice_sync, >> + ({ >> + msleep(BASE_TIME); >> + malarm(BASE_TIME); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(barrier_sync_next(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test wait_next() with sync_next() and local barriers >> + * This is again the same as test_barrier_wait_next_twice_local but uses a >> + * synced wait as the second wait. This works just fine because the local >> state >> + * is in sync with the remote. >> + */ >> +TEST_BARRIER(test_barrier_wait_next_twice_local_sync, >> + ({ >> + msleep(BASE_TIME); >> + malarm(BASE_TIME); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync_next(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test sync_next() and sync() >> + * This tests sync_*() synchronizations and makes sure they work fine if the >> + * local state is behind the remote state. >> + */ >> +TEST_BARRIER(test_barrier_sync_next, >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_sync_next(&b)); >> + assert_se(barrier_sync(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync_next(&b)); >> + assert_se(barrier_sync_next(&b)); >> + assert_se(barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + msleep(BASE_TIME); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test sync_next() and sync() with local barriers >> + * This tests timeouts if sync_*() is used if local barriers are placed but >> the >> + * remote didn't place any. >> + */ >> +TEST_BARRIER(test_barrier_sync_next_local, >> + ({ >> + malarm(BASE_TIME); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync_next(&b)); >> + assert_se(0); >> + }), >> + TEST_BARRIER_WAIT_ALARM(pid1), >> + ({ >> + msleep(BASE_TIME * 2); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test sync_next() and sync() with local barriers and abortion >> + * This is the same as test_barrier_sync_next_local but aborts the sync in >> the >> + * parent. Therefore, the sync_next() succeeds just fine due to the >> abortion. >> + */ >> +TEST_BARRIER(test_barrier_sync_next_local_abort, >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(!barrier_sync_next(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + assert_se(barrier_abort(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test matched wait_abortion() >> + * This runs wait_abortion() with remote abortion. >> + */ >> +TEST_BARRIER(test_barrier_wait_abortion, >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_wait_abortion(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + assert_se(barrier_abort(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test unmatched wait_abortion() >> + * This runs wait_abortion() without any remote abortion going on. It thus >> must >> + * timeout. >> + */ >> +TEST_BARRIER(test_barrier_wait_abortion_unmatched, >> + ({ >> + malarm(BASE_TIME); >> + assert_se(barrier_wait_abortion(&b)); >> + assert_se(0); >> + }), >> + TEST_BARRIER_WAIT_ALARM(pid1), >> + ({ >> + msleep(BASE_TIME * 2); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test matched wait_abortion() with local abortion >> + * This runs wait_abortion() with local and remote abortion. >> + */ >> +TEST_BARRIER(test_barrier_wait_abortion_local, >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_abort(&b)); >> + assert_se(!barrier_wait_abortion(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + assert_se(barrier_abort(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test unmatched wait_abortion() with local abortion >> + * This runs wait_abortion() with only local abortion. This must time out. >> + */ >> +TEST_BARRIER(test_barrier_wait_abortion_local_unmatched, >> + ({ >> + malarm(BASE_TIME); >> + assert_se(barrier_abort(&b)); >> + assert_se(!barrier_wait_abortion(&b)); >> + assert_se(0); >> + }), >> + TEST_BARRIER_WAIT_ALARM(pid1), >> + ({ >> + msleep(BASE_TIME * 2); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test child exit >> + * Place barrier and sync with the child. The child only exits()s, which >> should >> + * cause an implicit abortion and wake the parent. >> + */ >> +TEST_BARRIER(test_barrier_exit, >> + ({ >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(BASE_TIME * 10); >> + assert_se(barrier_place(&b)); >> + assert_se(!barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +/* >> + * Test child exit with sleep >> + * Same as test_barrier_exit but verifies the test really works due to the >> + * child-exit. We add a usleep() which triggers the alarm in the parent and >> + * causes the test to time out. >> + */ >> +TEST_BARRIER(test_barrier_no_exit, >> + ({ >> + msleep(BASE_TIME * 2); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + malarm(BASE_TIME); >> + assert_se(barrier_place(&b)); >> + assert_se(!barrier_sync(&b)); >> + }), >> + TEST_BARRIER_WAIT_ALARM(pid2)); >> + >> +/* >> + * Test pending exit against sync >> + * The parent places a barrier *and* exits. The 20ms wait in the child >> + * guarantees both are pending. However, our logic prefers pending barriers >> over >> + * pending exit-abortions (unlike normal abortions), thus the wait_next() >> must >> + * succeed, same for the sync_next() as our local barrier-count is smaller >> than >> + * the remote. Once we place a barrier our count is equal, so the sync still >> + * succeeds. Only if we place one more barrier, we're ahead of the remote, >> thus >> + * we will fail due to HUP on the pipe. >> + */ >> +TEST_BARRIER(test_barrier_pending_exit, >> + ({ >> + malarm(BASE_TIME * 4); >> + msleep(BASE_TIME * 2); >> + assert_se(barrier_wait_next(&b)); >> + assert_se(barrier_sync_next(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(barrier_sync_next(&b)); >> + assert_se(barrier_place(&b)); >> + assert_se(!barrier_sync_next(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid1), >> + ({ >> + assert_se(barrier_place(&b)); >> + }), >> + TEST_BARRIER_WAIT_SUCCESS(pid2)); >> + >> +int main(int argc, char *argv[]) { >> + log_parse_environment(); >> + log_open(); >> + >> + test_barrier_sync(); >> + test_barrier_wait_next(); >> + test_barrier_wait_next_twice(); >> + test_barrier_wait_next_twice_sync(); >> + test_barrier_wait_next_twice_local(); >> + test_barrier_wait_next_twice_local_sync(); >> + test_barrier_sync_next(); >> + test_barrier_sync_next_local(); >> + test_barrier_sync_next_local_abort(); >> + test_barrier_wait_abortion(); >> + test_barrier_wait_abortion_unmatched(); >> + test_barrier_wait_abortion_local(); >> + test_barrier_wait_abortion_local_unmatched(); >> + test_barrier_exit(); >> + test_barrier_no_exit(); >> + test_barrier_pending_exit(); >> + >> + return 0; >> +} >> -- >> 2.0.1 >> >> _______________________________________________ >> systemd-devel mailing list >> systemd-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel