[ https://issues.apache.org/jira/browse/GEODE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirk Lund reassigned GEODE-29: ------------------------------ Assignee: Kirk Lund (was: Jens Deppe) > Fix all functional/behavioral differences between cache.xml and the public > Java API. > ------------------------------------------------------------------------------------ > > Key: GEODE-29 > URL: https://issues.apache.org/jira/browse/GEODE-29 > Project: Geode > Issue Type: Improvement > Components: configuration > Affects Versions: 1.0.0-incubating > Environment: Apache Geode configured either with cache.xml, public > Java API or Gfsh (+Cluster Config, an extension of cache.xml). > Reporter: John Blum > Assignee: Kirk Lund > Priority: Critical > Labels: ApacheGeode, CacheXML, PublicJavaAPI > > Certain _Apache Geode_ functions/behaviors are encapsulated in "internal" > classes. Therefore, when a developer initially uses {{cache.xml}} to > configure _Geode_ and then (perhaps) switches to configuring a node > programmatically using the public, Java API with seemingly equivalent and > complimentary configuration logic certain things cease to "work as expected." > For example... > 1. Premature GatewayReceiver start before Region exists resulting in > event/data loss issue: > In {{cache.xml}}, if a developer defines a {{GatewayReceiver}} along with > Regions that may potentially be updated by the {{GatewayReceiver}}, _Goede_ > is careful not to "start" the {{GatewayReceiver}} until all the Regions have > been created when processing (parsing and initializing _Geode_ components) > the {{cache.xml}}. > If _Geode_ were to start the {{GatewayReceiver}} "prematurely", and then > events from the remote WAN site arrive before the Regions targeted by those > events are created, then Geode will drop those events, thus causing data > loss. Therefore _Geode's_ logic when processing {{cache.xml}} prevents this > from happening. > However, if a developer uses the public, Java API to define the same > configuration, no out-of-box protection is offered to prevent event (data) > loss from happening, thus leaving application developers of the _Geode_ API > to know how _Geode_ functions "internally". > Fortunately, application developers are not completely left to fend for > themselves and be purview to all the details. Technologies, such as _Spring > Data GemFire_, also consume and adhere to the _Geode_ public, Java API (and > +only+ the "public" Java API; "internal" classes are not used given they are > subject to change), is able to handle this using Spring's robust bean > container lifecycle management features. However, other application > consumers using the API will not fare as well. > 2. Another problem stems from the poorly conceived and "imposed" ordering of > persistent Regions. > For instance, if I have 2 Members, each defining 2 persistent Regions, for > which the Members are the "primary" for 1 of the 2 Regions and the 'other' > Member hosts the redundant copy, like so... > Member Regions > ------------------------- > X B, A' > Y A, B' > Tick (') - indicates member (e.g. X) is the primary for a particular Region > (i.e. A). > Then, the system can result in a distributed deadlock due the non-apparent, > non-arbitrary dependency between the Members caused by an improper > configuration order of the Regions. > In this situation, the primary Member for a Region must start before the > Member hosting the redundant Region copy (secondary) because it is a property > of _Geode" that the primary will have most recent, correct copy of the data. > But, as I have illustrated above, when the system starts, and because I have > defined the Regions in an improper (arbitrary) order, this system will > deadlock. I.e. when Member X starts, it will attempt to create Region B > first. However, Member X must wait for Member Y to start since Member Y is > the "primary" for Region B. > However, when Member Y starts, and because it tries to create Region A first, > it too will wait on Member X hosting the "primary" copy of Region A thereby > leading to a situation where each Member waits for the other and results in a > distributed deadlock. > This example is pretty scaled and get more complex as you add Members and > additional Regions in a complex system. > Of course, the "easy" solution is to ensure the Members in the cluster > declaring the Region all define the Regions in their configuration in the > "same order". This is made even easier with the use of a cluster-wide, > shared configuration using the Cluster Configuration Service). So by > defining all Regions in the same order on every Member (e.g. A followed by > B), then a developer/user can avoid the distributed deadlock. > However, it is naive for _Geode_ to assume users will know/conform to this > restriction and impose an non-arbitrary order to workaround, basically, a > technical limitation of the code. > In other environments, such as Spring, you cannot necessarily guarantee what > the order will be at runtime, especially if application components (e.g. > DAO's) inject references to GemFire components (e.g. Regions) along with > using in combination other advanced Spring container features like CLASSPATH > component-scanning to wire up the entire application. > Even "collocation" has an impact on the Region creation order since Spring > must logically satisfy the "dependency" order of the beans first. This is > both logical and makes sense, where as Geode's ordering is non-arbitrary and > non-apparent since any Member could host the redundant copy. Therefore, this > problem is an implementation detail leaked. > Technically, the same problem can be reproduced in {{cache.xml}} for that > matter with no Spring present. And, this problem is especially more likely > to happen using the public Java API since again, there is no special *magic* > being handled by "internal" Geode classes (in this case) w.r.t. to > {{cache.xml}}. Users/developers just have to know the correct ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)