Re: [configuration] Thoughts about multi-threading
Am 20.09.2012 22:38, schrieb Honton, Charles: snip This obviously makes the concurrency problem easier :) Apart from this case, it would be good to agree on exactly what it means for [configuration] to be threadsafe. Is it basically the semantics of ConcurrentHashmap? Or are there sequencing / event serialization constraints? For example, suppose the sequence below happens Thread A start add property Thread B start clear Thread A notify property change Thread B notify clear Thread B clear map Thread A update map Is it OK for this sequence to happen? Is it OK for A's add to trump B's clear even though B's activation started later and B's notification was later? This is a very good point, I did not think about this. It would be very confusing for an event listener to receive a clear event and then find out that the configuration is not empty. So I guess it is too naive to simply replace the plain map by a concurrent map to gain thread-safety. We will then probably have to use a read-write lock for map-based configurations, too. Well, this is not too bad. The only thing which worries me a bit is that we have to call event listeners while the lock is held. This is an anti-pattern described by Bloch: Don't call an alien method with a lock held!. Does anybody has an idea how we could prevent this? Interesting problem. Unfortunately, I don't think you can enforce the serializability invariant (no interleaving as above) without some [configuration] thread holding a lock while notification happens. You could queue notify-update tasks so add/update invocations don't block; but whatever thread actually executes the tasks would have to lock takes from the queue while notifies complete. That might not be that bad, since you could continue to services reads (of yet-to-be-updated data) and queue updates while the foreign lock was held; but it still violates the maxim, with the consequence that a hung listener could stop updates from happening. For a similar problem I once used a single-threaded executor. The code holding the lock just scheduled tasks at the executor which were responsible for sending notifications to event listeners. The tasks were initialized with the events to propagate and a snapshot of the currently registered event listeners. Not sure whether this is a suitable approach for the problem at hand. It is probably not a good idea to create a new executor service for each Configuration object. And having a single shared instance can become a bottle neck. Is serializability what's desired? Or is it consistency? I can imagine a situation where multiple properties must enforce some invariant relationship. The producer would like to be able to hold off notifying the property consumers before the next property change fixes the invariant constraint violation. Likewise the consumer might want a set which is invariant between applying the first property and the last property. Honestly, I don't know. I hope that we can start with something simple and then build advanced functionality on top of it. Oliver chas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [configuration] Thoughts about multi-threading
Oliver Heger wrote: Hi Jörg, many thanks for your input! Am 17.09.2012 10:01, schrieb Jörg Schaible: [snip] However, what also bugs me in the meantime is the current hard relation between the configuration object and its format. Why should I care at all in what format the configuration had been saved when I access its values? For some time I am thinking now of something in the line of: - interface Configuration: Core interfaces, only getters - interface ReloadableConfiguration extends Configuration, Reloadable - class BaseConfiguration: In memory, implements all the stuff for interpolation and the setters - interface ConfigurationSource: Core interface to load (and probably save a configuration) - class PropertiesConfigurationSource: Concrete implementation that loads a properties file and creates a BaseConfiguration This approach offers immutability for the Configuration itself and also allows Serializability. Format is separated completely from the configuration functionality. I know, this looks more like Configuration 3.0 ... ;-) I really like this approach. I was also thinking about separating loading and saving from core Configuration classes. However, I fear such an approach will make it difficult to preserve the format of a configuration. E.g. XMLConfiguration currently stores the XML document it was loaded from. So when saved to disk, result looks much like the original document. A ConfigurationSource may still collect the necessary data and keep it internally or attach it somehow to the generated Configuration instance for later use. - Jörg - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [configuration] Thoughts about multi-threading
On 9/19/12 1:19 PM, Oliver Heger wrote: Hi Phil, Am 18.09.2012 20:09, schrieb Phil Steitz: On 9/17/12 12:39 PM, Oliver Heger wrote: Hi Jörg, many thanks for your input! Am 17.09.2012 10:01, schrieb Jörg Schaible: Hi Oliver, Oliver Heger wrote: Hi, one limitation of the 1.x versions of [configuration] is the incomplete support for concurrent access to Configuration objects. In version 2.0 we should try to improve this. I have some ideas about this topic - not fully thought out - and would like to start a discussion. Here they are (in no specific order): - Thread-safety is not always required. Therefore, I would like to take an approach similar to the JDK Collections framework: having basic, unsynchronized configurations which can be turned into thread-safe ones. Fair approach. - Many Configuration implementations are based on a hash map. Access to their content could be made thread-safe by replacing the map by a ConcurrentHashMap. You could use a protected method to instantiate the underlaying HashMap. Then you're free in overloaded Configurations.synchronizedConfiguration methods to use a derived class or a wrapper. It has also implications on subset(). Or pass the map in at construction time. Using a protected method to create the map would either mean that the constructor has to invoke this method (problematic for subclasses which are not yet fully initialized at this time) or the field cannot be made final. Alternatively, there could be an abstract method getMap() returning the reference to the map. - For hierarchical configurations situation is more complex. Here we will probably need something like a ReadWriteLock to protect their content. (There could be different lock implementations including a dummy one used by unsynchronized configurations). - Reloading is a major problem; at least in the way it is implemented now, it is very hard to get synchronization correct and efficient. Therefore, I would like to use a different strategy here. One option could be to not handle reloading in Configuration objects, but on an additional layer which creates such objects. - Other properties of configuration objects (e.g. the throwExceptionOnMissing flag or the file name) must be taken into account, too. In a typical use case those should not be accessed frequently, so it is probably not an issue to always synchronize them or make them volatile. Looking forward to your input Another option would be immutability (well, apart probably from reloading). Personally I have often the use case that I do not want to offer my clients/consumers to write into the configuration. One approach can also be the JDK approach creating Collections.unmodifiableConfiguration. However, what also bugs me in the meantime is the current hard relation between the configuration object and its format. Why should I care at all in what format the configuration had been saved when I access its values? For some time I am thinking now of something in the line of: - interface Configuration: Core interfaces, only getters - interface ReloadableConfiguration extends Configuration, Reloadable - class BaseConfiguration: In memory, implements all the stuff for interpolation and the setters - interface ConfigurationSource: Core interface to load (and probably save a configuration) - class PropertiesConfigurationSource: Concrete implementation that loads a properties file and creates a BaseConfiguration This approach offers immutability for the Configuration itself and also allows Serializability. Format is separated completely from the configuration functionality. I know, this looks more like Configuration 3.0 ... ;-) I really like this approach. I was also thinking about separating loading and saving from core Configuration classes. However, I fear such an approach will make it difficult to preserve the format of a configuration. E.g. XMLConfiguration currently stores the XML document it was loaded from. So when saved to disk, result looks much like the original document. Read-only configurations is also an interesting topic. This obviously makes the concurrency problem easier :) Apart from this case, it would be good to agree on exactly what it means for [configuration] to be threadsafe. Is it basically the semantics of ConcurrentHashmap? Or are there sequencing / event serialization constraints? For example, suppose the sequence below happens Thread A start add property Thread B start clear Thread A notify property change Thread B notify clear Thread B clear map Thread A update map Is it OK for this sequence to happen? Is it OK for A's add to trump B's clear even though B's activation started later and B's notification was later? This is a very good point, I did not think about this. It would be very confusing for an event listener to receive a clear event and then find out that the configuration is
Re: [configuration] Thoughts about multi-threading
snip This obviously makes the concurrency problem easier :) Apart from this case, it would be good to agree on exactly what it means for [configuration] to be threadsafe. Is it basically the semantics of ConcurrentHashmap? Or are there sequencing / event serialization constraints? For example, suppose the sequence below happens Thread A start add property Thread B start clear Thread A notify property change Thread B notify clear Thread B clear map Thread A update map Is it OK for this sequence to happen? Is it OK for A's add to trump B's clear even though B's activation started later and B's notification was later? This is a very good point, I did not think about this. It would be very confusing for an event listener to receive a clear event and then find out that the configuration is not empty. So I guess it is too naive to simply replace the plain map by a concurrent map to gain thread-safety. We will then probably have to use a read-write lock for map-based configurations, too. Well, this is not too bad. The only thing which worries me a bit is that we have to call event listeners while the lock is held. This is an anti-pattern described by Bloch: Don't call an alien method with a lock held!. Does anybody has an idea how we could prevent this? Interesting problem. Unfortunately, I don't think you can enforce the serializability invariant (no interleaving as above) without some [configuration] thread holding a lock while notification happens. You could queue notify-update tasks so add/update invocations don't block; but whatever thread actually executes the tasks would have to lock takes from the queue while notifies complete. That might not be that bad, since you could continue to services reads (of yet-to-be-updated data) and queue updates while the foreign lock was held; but it still violates the maxim, with the consequence that a hung listener could stop updates from happening. Is serializability what's desired? Or is it consistency? I can imagine a situation where multiple properties must enforce some invariant relationship. The producer would like to be able to hold off notifying the property consumers before the next property change fixes the invariant constraint violation. Likewise the consumer might want a set which is invariant between applying the first property and the last property. chas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [configuration] Thoughts about multi-threading
Hi Phil, Am 18.09.2012 20:09, schrieb Phil Steitz: On 9/17/12 12:39 PM, Oliver Heger wrote: Hi Jörg, many thanks for your input! Am 17.09.2012 10:01, schrieb Jörg Schaible: Hi Oliver, Oliver Heger wrote: Hi, one limitation of the 1.x versions of [configuration] is the incomplete support for concurrent access to Configuration objects. In version 2.0 we should try to improve this. I have some ideas about this topic - not fully thought out - and would like to start a discussion. Here they are (in no specific order): - Thread-safety is not always required. Therefore, I would like to take an approach similar to the JDK Collections framework: having basic, unsynchronized configurations which can be turned into thread-safe ones. Fair approach. - Many Configuration implementations are based on a hash map. Access to their content could be made thread-safe by replacing the map by a ConcurrentHashMap. You could use a protected method to instantiate the underlaying HashMap. Then you're free in overloaded Configurations.synchronizedConfiguration methods to use a derived class or a wrapper. It has also implications on subset(). Or pass the map in at construction time. Using a protected method to create the map would either mean that the constructor has to invoke this method (problematic for subclasses which are not yet fully initialized at this time) or the field cannot be made final. Alternatively, there could be an abstract method getMap() returning the reference to the map. - For hierarchical configurations situation is more complex. Here we will probably need something like a ReadWriteLock to protect their content. (There could be different lock implementations including a dummy one used by unsynchronized configurations). - Reloading is a major problem; at least in the way it is implemented now, it is very hard to get synchronization correct and efficient. Therefore, I would like to use a different strategy here. One option could be to not handle reloading in Configuration objects, but on an additional layer which creates such objects. - Other properties of configuration objects (e.g. the throwExceptionOnMissing flag or the file name) must be taken into account, too. In a typical use case those should not be accessed frequently, so it is probably not an issue to always synchronize them or make them volatile. Looking forward to your input Another option would be immutability (well, apart probably from reloading). Personally I have often the use case that I do not want to offer my clients/consumers to write into the configuration. One approach can also be the JDK approach creating Collections.unmodifiableConfiguration. However, what also bugs me in the meantime is the current hard relation between the configuration object and its format. Why should I care at all in what format the configuration had been saved when I access its values? For some time I am thinking now of something in the line of: - interface Configuration: Core interfaces, only getters - interface ReloadableConfiguration extends Configuration, Reloadable - class BaseConfiguration: In memory, implements all the stuff for interpolation and the setters - interface ConfigurationSource: Core interface to load (and probably save a configuration) - class PropertiesConfigurationSource: Concrete implementation that loads a properties file and creates a BaseConfiguration This approach offers immutability for the Configuration itself and also allows Serializability. Format is separated completely from the configuration functionality. I know, this looks more like Configuration 3.0 ... ;-) I really like this approach. I was also thinking about separating loading and saving from core Configuration classes. However, I fear such an approach will make it difficult to preserve the format of a configuration. E.g. XMLConfiguration currently stores the XML document it was loaded from. So when saved to disk, result looks much like the original document. Read-only configurations is also an interesting topic. This obviously makes the concurrency problem easier :) Apart from this case, it would be good to agree on exactly what it means for [configuration] to be threadsafe. Is it basically the semantics of ConcurrentHashmap? Or are there sequencing / event serialization constraints? For example, suppose the sequence below happens Thread A start add property Thread B start clear Thread A notify property change Thread B notify clear Thread B clear map Thread A update map Is it OK for this sequence to happen? Is it OK for A's add to trump B's clear even though B's activation started later and B's notification was later? This is a very good point, I did not think about this. It would be very confusing for an event listener to receive a clear event and then find out that the configuration is not empty. So I guess it is too naive to simply replace the plain map by a concurrent map to gain thread-safety. We will then probably have
Re: [configuration] Thoughts about multi-threading
On 9/17/12 12:39 PM, Oliver Heger wrote: Hi Jörg, many thanks for your input! Am 17.09.2012 10:01, schrieb Jörg Schaible: Hi Oliver, Oliver Heger wrote: Hi, one limitation of the 1.x versions of [configuration] is the incomplete support for concurrent access to Configuration objects. In version 2.0 we should try to improve this. I have some ideas about this topic - not fully thought out - and would like to start a discussion. Here they are (in no specific order): - Thread-safety is not always required. Therefore, I would like to take an approach similar to the JDK Collections framework: having basic, unsynchronized configurations which can be turned into thread-safe ones. Fair approach. - Many Configuration implementations are based on a hash map. Access to their content could be made thread-safe by replacing the map by a ConcurrentHashMap. You could use a protected method to instantiate the underlaying HashMap. Then you're free in overloaded Configurations.synchronizedConfiguration methods to use a derived class or a wrapper. It has also implications on subset(). Or pass the map in at construction time. Using a protected method to create the map would either mean that the constructor has to invoke this method (problematic for subclasses which are not yet fully initialized at this time) or the field cannot be made final. Alternatively, there could be an abstract method getMap() returning the reference to the map. - For hierarchical configurations situation is more complex. Here we will probably need something like a ReadWriteLock to protect their content. (There could be different lock implementations including a dummy one used by unsynchronized configurations). - Reloading is a major problem; at least in the way it is implemented now, it is very hard to get synchronization correct and efficient. Therefore, I would like to use a different strategy here. One option could be to not handle reloading in Configuration objects, but on an additional layer which creates such objects. - Other properties of configuration objects (e.g. the throwExceptionOnMissing flag or the file name) must be taken into account, too. In a typical use case those should not be accessed frequently, so it is probably not an issue to always synchronize them or make them volatile. Looking forward to your input Another option would be immutability (well, apart probably from reloading). Personally I have often the use case that I do not want to offer my clients/consumers to write into the configuration. One approach can also be the JDK approach creating Collections.unmodifiableConfiguration. However, what also bugs me in the meantime is the current hard relation between the configuration object and its format. Why should I care at all in what format the configuration had been saved when I access its values? For some time I am thinking now of something in the line of: - interface Configuration: Core interfaces, only getters - interface ReloadableConfiguration extends Configuration, Reloadable - class BaseConfiguration: In memory, implements all the stuff for interpolation and the setters - interface ConfigurationSource: Core interface to load (and probably save a configuration) - class PropertiesConfigurationSource: Concrete implementation that loads a properties file and creates a BaseConfiguration This approach offers immutability for the Configuration itself and also allows Serializability. Format is separated completely from the configuration functionality. I know, this looks more like Configuration 3.0 ... ;-) I really like this approach. I was also thinking about separating loading and saving from core Configuration classes. However, I fear such an approach will make it difficult to preserve the format of a configuration. E.g. XMLConfiguration currently stores the XML document it was loaded from. So when saved to disk, result looks much like the original document. Read-only configurations is also an interesting topic. This obviously makes the concurrency problem easier :) Apart from this case, it would be good to agree on exactly what it means for [configuration] to be threadsafe. Is it basically the semantics of ConcurrentHashmap? Or are there sequencing / event serialization constraints? For example, suppose the sequence below happens Thread A start add property Thread B start clear Thread A notify property change Thread B notify clear Thread B clear map Thread A update map Is it OK for this sequence to happen? Is it OK for A's add to trump B's clear even though B's activation started later and B's notification was later? Phil But I think you are right, we have to start with smaller steps first. Not sure whether we can manage this - but I would really like to get something out in the not-too-far future. Oliver - Jörg - To
Re: [configuration] Thoughts about multi-threading
Hi Oliver, Oliver Heger wrote: Hi, one limitation of the 1.x versions of [configuration] is the incomplete support for concurrent access to Configuration objects. In version 2.0 we should try to improve this. I have some ideas about this topic - not fully thought out - and would like to start a discussion. Here they are (in no specific order): - Thread-safety is not always required. Therefore, I would like to take an approach similar to the JDK Collections framework: having basic, unsynchronized configurations which can be turned into thread-safe ones. Fair approach. - Many Configuration implementations are based on a hash map. Access to their content could be made thread-safe by replacing the map by a ConcurrentHashMap. You could use a protected method to instantiate the underlaying HashMap. Then you're free in overloaded Configurations.synchronizedConfiguration methods to use a derived class or a wrapper. It has also implications on subset(). - For hierarchical configurations situation is more complex. Here we will probably need something like a ReadWriteLock to protect their content. (There could be different lock implementations including a dummy one used by unsynchronized configurations). - Reloading is a major problem; at least in the way it is implemented now, it is very hard to get synchronization correct and efficient. Therefore, I would like to use a different strategy here. One option could be to not handle reloading in Configuration objects, but on an additional layer which creates such objects. - Other properties of configuration objects (e.g. the throwExceptionOnMissing flag or the file name) must be taken into account, too. In a typical use case those should not be accessed frequently, so it is probably not an issue to always synchronize them or make them volatile. Looking forward to your input Another option would be immutability (well, apart probably from reloading). Personally I have often the use case that I do not want to offer my clients/consumers to write into the configuration. One approach can also be the JDK approach creating Collections.unmodifiableConfiguration. However, what also bugs me in the meantime is the current hard relation between the configuration object and its format. Why should I care at all in what format the configuration had been saved when I access its values? For some time I am thinking now of something in the line of: - interface Configuration: Core interfaces, only getters - interface ReloadableConfiguration extends Configuration, Reloadable - class BaseConfiguration: In memory, implements all the stuff for interpolation and the setters - interface ConfigurationSource: Core interface to load (and probably save a configuration) - class PropertiesConfigurationSource: Concrete implementation that loads a properties file and creates a BaseConfiguration This approach offers immutability for the Configuration itself and also allows Serializability. Format is separated completely from the configuration functionality. I know, this looks more like Configuration 3.0 ... ;-) - Jörg - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [configuration] Thoughts about multi-threading
Hi Jörg, many thanks for your input! Am 17.09.2012 10:01, schrieb Jörg Schaible: Hi Oliver, Oliver Heger wrote: Hi, one limitation of the 1.x versions of [configuration] is the incomplete support for concurrent access to Configuration objects. In version 2.0 we should try to improve this. I have some ideas about this topic - not fully thought out - and would like to start a discussion. Here they are (in no specific order): - Thread-safety is not always required. Therefore, I would like to take an approach similar to the JDK Collections framework: having basic, unsynchronized configurations which can be turned into thread-safe ones. Fair approach. - Many Configuration implementations are based on a hash map. Access to their content could be made thread-safe by replacing the map by a ConcurrentHashMap. You could use a protected method to instantiate the underlaying HashMap. Then you're free in overloaded Configurations.synchronizedConfiguration methods to use a derived class or a wrapper. It has also implications on subset(). Or pass the map in at construction time. Using a protected method to create the map would either mean that the constructor has to invoke this method (problematic for subclasses which are not yet fully initialized at this time) or the field cannot be made final. Alternatively, there could be an abstract method getMap() returning the reference to the map. - For hierarchical configurations situation is more complex. Here we will probably need something like a ReadWriteLock to protect their content. (There could be different lock implementations including a dummy one used by unsynchronized configurations). - Reloading is a major problem; at least in the way it is implemented now, it is very hard to get synchronization correct and efficient. Therefore, I would like to use a different strategy here. One option could be to not handle reloading in Configuration objects, but on an additional layer which creates such objects. - Other properties of configuration objects (e.g. the throwExceptionOnMissing flag or the file name) must be taken into account, too. In a typical use case those should not be accessed frequently, so it is probably not an issue to always synchronize them or make them volatile. Looking forward to your input Another option would be immutability (well, apart probably from reloading). Personally I have often the use case that I do not want to offer my clients/consumers to write into the configuration. One approach can also be the JDK approach creating Collections.unmodifiableConfiguration. However, what also bugs me in the meantime is the current hard relation between the configuration object and its format. Why should I care at all in what format the configuration had been saved when I access its values? For some time I am thinking now of something in the line of: - interface Configuration: Core interfaces, only getters - interface ReloadableConfiguration extends Configuration, Reloadable - class BaseConfiguration: In memory, implements all the stuff for interpolation and the setters - interface ConfigurationSource: Core interface to load (and probably save a configuration) - class PropertiesConfigurationSource: Concrete implementation that loads a properties file and creates a BaseConfiguration This approach offers immutability for the Configuration itself and also allows Serializability. Format is separated completely from the configuration functionality. I know, this looks more like Configuration 3.0 ... ;-) I really like this approach. I was also thinking about separating loading and saving from core Configuration classes. However, I fear such an approach will make it difficult to preserve the format of a configuration. E.g. XMLConfiguration currently stores the XML document it was loaded from. So when saved to disk, result looks much like the original document. Read-only configurations is also an interesting topic. But I think you are right, we have to start with smaller steps first. Not sure whether we can manage this - but I would really like to get something out in the not-too-far future. Oliver - Jörg - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[configuration] Thoughts about multi-threading
Hi, one limitation of the 1.x versions of [configuration] is the incomplete support for concurrent access to Configuration objects. In version 2.0 we should try to improve this. I have some ideas about this topic - not fully thought out - and would like to start a discussion. Here they are (in no specific order): - Thread-safety is not always required. Therefore, I would like to take an approach similar to the JDK Collections framework: having basic, unsynchronized configurations which can be turned into thread-safe ones. - Many Configuration implementations are based on a hash map. Access to their content could be made thread-safe by replacing the map by a ConcurrentHashMap. - For hierarchical configurations situation is more complex. Here we will probably need something like a ReadWriteLock to protect their content. (There could be different lock implementations including a dummy one used by unsynchronized configurations). - Reloading is a major problem; at least in the way it is implemented now, it is very hard to get synchronization correct and efficient. Therefore, I would like to use a different strategy here. One option could be to not handle reloading in Configuration objects, but on an additional layer which creates such objects. - Other properties of configuration objects (e.g. the throwExceptionOnMissing flag or the file name) must be taken into account, too. In a typical use case those should not be accessed frequently, so it is probably not an issue to always synchronize them or make them volatile. Looking forward to your input Oliver - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org