On Tuesday, January 2, 2024 3:41:55 AM MST Anonymouse via Digitalmars-d-learn wrote: > On Monday, 1 January 2024 at 19:49:28 UTC, Jonathan M Davis wrote: > > [...] > > Thank you. Yes, `Foo` is a class for the purposes of inheritance > -- I left that out of the example. > > So a completely valid solution is to write a struct wrapper > around an AA of the type I need, overload the required operators, > and then just drop-in replace the current AA? All array > operations would then transparently be between lock and unlock > statements. > ... > I tried this and it seems to work. Is it glaringly incorrect > somehow, or am I free to roll with this?
For a member function to work on a shared object, that member function must be marked as shared. For most types, of course, it's completely inappropriate to mark a member function as shared, since those types do nothing with thread synchronization, but for types which are specifically designed to work across threads, it's appropriate to mark the member functions as shared and then to have them internally do whatever they need to do to protect access across threads. So, if you want to create a type which wraps an AA to make it thread-safe, it would be appropriate to make all of its member functions shared and then deal with the locking primitives internally. That being said, as Christian pointed out, whether locking at the operation level is the best way to deal with thread synchronization depends on what you're doing. For instance, doing something like if (auto value = key in aa) { // do stuff with *value } would be seriously problematic if you simply wrapped the AA, because you'd then have a pointer to a value in the AA which might not even be in the AA anymore when you try to do something with it (since another thread could have happily mutated the AA via another function call). In addition, it's problematic if in returns a thread-local pointer, since it's referencing shared data. So, it's a bug for it to be returning a thread-local pointer. It either needs to be returning a shared pointer (meaning that the locking primitives need to be at a higher level), or you need to copy the data out rather than return a pointer (as well as ensuring that the data itself is fully thread-local, which could be problematic with reference types). For something like this, locking the mutex for a whole set of operations makes more sense, and if you're doing that, you probably don't want a struct or class which simply wraps the AA. Rather, you'd want to have whatever code was operating on the AA to be handling the locking - which would often be inside of a struct or class that has the AA as a shared member variable. So, all of the code that uses the AA would be encapsulated, but you wouldn't have created a type that's simply wrapping the AA. What you'd typically do would probably be one of two approaches: 1. Create a type which handles all of the threading stuff internally (including spawning the other thread) and which provides an API to the main thread which is thread-local in the sense that the main thread doesn't have to know or care about the threading that's being done internally. 2. Create a type which is passed from one thread to another and then designed to be used across threads in a thread-safe manner where any operation that you can do on that type is marked as shared and designed to be thread-safe, which would typically mean having the operations being high enough level that the caller doesn't have to worry about the synchronization primitives at all, though of course, depending on what you're doing, you might have to expose more. Either way, the idea would be to make it so that that shared object is handling as much of the threading synchronization stuff as possible, and when it doesn't, it provides the means for the code using it to use the thread synchronization mechanisms on the data in as simple and safe a manner as possible. You could of course have a much messier approach where nothing is really wrapped, but that makes it much harder to manage the code, whereas it will usually work much better if you can encapsulate the stuff that has to deal with shared. But you do have to pick an encapsulation level which allows you to actually protect the data, which isn't likely to be the case at the level of the AA itself but rather at a higher level which is using the AA to do stuff. Ultimately, what you do with sharing data across threads in D is essentially what you'd do in a language like C++. It's just that D's shared makes it clear to the type system what's being shared across threads and what's thread-local so that the type system can assume that most stuff is thread-local as well as prevent you from accessing shared data accidentally (whereas in C++, you only know what's thread-local and what isn't by convention, since normally, _everything_ is shared across threads but only a small portion of it is actusally used by multiple threads). So, we're able to represent in the type system which sections of the code are designed to operate on shared data and which aren't, with the downside that we have to carefully cast away shared within sections of code that actually deal with the appropriate locking primitives. And if that casting is done incorrectly, it can result in objects being treated as thread-local when they aren't, but since shared is part of the type system, you only have to look at the code involving shared to find the sections where you may have screwed up and escaped thread-local references to shared data. So, the way that sharing data across threads works in D is essentially the same as with most C-derived languages, but we have some extra stuff in the type system which makes it harder to shoot yourself in the foot but at the cost that it's more involved to be able to operate on shared data. > You mention passing a `shared Foo*`. In the gist I pass the > instance of the `MutexedAA!(string[int])` to the worker thread > *by value* instead of as something `shared`, since I couldn't get > operator overloading to work when `shared`. (Calling > `sharedAA.opIndexAssign("hello", 42)` worked, but `sharedAA[42] = > "hello"` wouldn't compile.) > > I guess this can break synchronisation between the two if I > replace the `Mutex` in either thread. Are there any other obvious > caveats? When sharing stuff across threads, it needs to be in a place where both threads can access the exact same data. If you pass an object across by value, then the object itself is not actually shared across threads (even if it's typed as shared), and no value types within the object are actually shared across threads - just whatever pointer or reference types are contained within the object, and that can get particularly problematic when you have pseudo-reference types - i.e. a type which has some stuff which lives on the stack on some stuff which lives on the heap, which for instance is what you get with dynamic arrays in D, since the data itself lives on the heap, but the pointer and length live on the stack. shared pointers and references inside a shared value type would have the data that they point to shared across threads, but the pointer or reference itself would not be (the same with AAs, which are essentially reference types with special semantics for initialization, which is why you typically need to initialize an AA before passing it around). So, when you're passing objects across threads where the object is supposed to be shared across threads (as opposed to when you're trying to pass ownership across), you typically want it to be either a pointer or reference. That way, everything within the object is shared across threads. So, I would suggest that you design a type which contains both the AA and its associated mutex where it deals with all of the locking internally but has shared member functions so that the code using that object doesn't need to deal with the locking primitives. However, as discussed above, it will need to have operations which are high enough level that all of the locking can be done safely internally (which is not necessarily the case with individual operations on an AA). However, since the best way to do that very much depends on what exactly you're doing, I can't really say how you need to go about it, but you do need to make sure that you don't do stuff like escape thread-local pointers to the shared data (like the opBinaryRight function you have does), and you need to make sure that you don't have issues with data being potentially out-of-date if you release the lock before doing something with that data which requires that the data not be out-of-date. > > ```d > struct MutexedAA(AA : V[K], V, K) > { > import core.sync.mutex : Mutex; > > shared Mutex mutex; > > shared AA aa; > > void setup() nothrow > { > mutex = new shared Mutex; > > mutex.lock_nothrow(); > > if (K.init !in (cast()aa)) > { > (cast()aa)[K.init] = V.init; > (cast()aa).remove(K.init); > } > > mutex.unlock_nothrow(); > } Note that this is what a constructor would normally be for, and if you absolutely need something like this to run before using the type, then a class would potentially be a better fit, since then you can have a default constructor - and as discussed above, you probably want to be using a type which is shared across threads via a pointer or reference anyway, so it may make more sense to just use a class in this case - though as also discussed above, you probably don't want a type that simply wraps the AA, so what this type should look like probably needs to be rethought anyway. > auto opIndexAssign(V value, K key) > in (mutex, typeof(this).stringof ~ " has null Mutex") > { > mutex.lock_nothrow(); > (cast()aa)[key] = value; > mutex.unlock_nothrow(); > return value; > } Note that in general, whether assigning keys or values like this works properly depends on whether the data can be accessed across threads. If they're value types, it's fine, but if they're not, then no other parts of the program can have references to the objects being passed in, or you're going to have problems when accessing them from another thread. This goes for passing data across threads in general. Either the objects need to be of types which are designed to be used as shared (and are therefore operated on as shared rather than having shared cast away, though you probably still need to temporarily cast away shared to get them in and out of the AA), or the thread passing the object to another thread cannot retain a reference to that object. So, you have to worry about more than just the AA itself when worrying about thread-safety. You also need to worry about what you're putting in it and what you do when you take it out. As such, if the types of the key or value are not value types, it might make sense to mark them as shared on the function parameters so that it's clear to the caller that the data is being shared across threads. Note however that the situation is slightly more complex with the keys. With the values, you know that they're being shared across threads, whereas with the keys, it depends on the exact function being called. For instance, opIndexAssign would potentially end up storing the key object inside of the AA (meaning that it has to be treated as shared just like the value), whereas opIndex would just be using the key for hashing and comparison, so it wouldn't need to worry about treating the key as shared. > auto opIndex(K key) > in (mutex, typeof(this).stringof ~ " has null Mutex") > { > mutex.lock_nothrow(); > auto value = (cast()aa)[key]; > mutex.unlock_nothrow(); > return value; > } Again, the data needs to have been unique when being put in the AA, or you'll have problems here when you take it out. In addition, even if it's unique, since this function leaves the data within the AA, if the type being returned is not a value type, then you have to be sure that no other thread can access the data while this thread is operating on it, which means that the locking primitives likely need to be at a higher level. > auto opBinaryRight(string op : "in")(K key) > in (mutex, typeof(this).stringof ~ " has null Mutex") > { > mutex.lock_nothrow(); > auto value = key in cast()aa; > mutex.unlock_nothrow(); > return value; > } Note that this opBinaryRight is escaping a thread-local pointer to shared data, which violates thread-safety. If you're going to escape a pointer to the shared data, then that pointer needs to be shared (which really means that the locking mechanism needs to be at a higher level so that it protects access to the data while it's beeing used), or the data needs to be deep copied so that you're dealing entirely with thread-local data rather than a thread-local pointer to shared data. So, hopefully, that clarifies some things, but you do need to be very aware of what data is escaping when casting away shared or casting to shared, since you don't want to be operating on thread-local references to shared data without that data being protected by the appropriate thread synchronization primitives. The type system will prevent you from doing much with shared in general without casting away shared, but you're then on your own with any code where you cast away shared, just like you're on your own with verifying that @trusted code is actually @safe, since the compiler can't do it for you. - Jonathan M Davis