[ 
https://issues.apache.org/jira/browse/GEODE-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Lund reassigned GEODE-29:
------------------------------

    Assignee: Kirk Lund  (was: Jens Deppe)

> Fix all functional/behavioral differences between cache.xml and the public 
> Java API.
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-29
>                 URL: https://issues.apache.org/jira/browse/GEODE-29
>             Project: Geode
>          Issue Type: Improvement
>          Components: configuration
>    Affects Versions: 1.0.0-incubating
>         Environment: Apache Geode configured either with cache.xml, public 
> Java API or Gfsh (+Cluster Config, an extension of cache.xml).
>            Reporter: John Blum
>            Assignee: Kirk Lund
>            Priority: Critical
>              Labels: ApacheGeode, CacheXML, PublicJavaAPI
>
> Certain _Apache Geode_ functions/behaviors are encapsulated in "internal" 
> classes.  Therefore, when a developer initially uses {{cache.xml}} to 
> configure _Geode_ and then (perhaps) switches to configuring a node 
> programmatically using the public, Java API with seemingly equivalent and 
> complimentary configuration logic certain things cease to "work as expected."
> For example...
> 1. Premature GatewayReceiver start before Region exists resulting in 
> event/data loss issue:
> In {{cache.xml}}, if a developer defines a {{GatewayReceiver}} along with 
> Regions that may potentially be updated by the {{GatewayReceiver}}, _Goede_ 
> is careful not to "start" the {{GatewayReceiver}} until all the Regions have 
> been created when processing (parsing and initializing _Geode_ components) 
> the {{cache.xml}}.
> If _Geode_ were to start the {{GatewayReceiver}} "prematurely", and then 
> events from the remote WAN site arrive before the Regions targeted by those 
> events are created, then Geode will drop those events, thus causing data 
> loss.  Therefore _Geode's_ logic when processing {{cache.xml}} prevents this 
> from happening.
> However, if a developer uses the public, Java API to define the same 
> configuration, no out-of-box protection is offered to prevent event (data) 
> loss from happening, thus leaving application developers of the _Geode_ API 
> to know how _Geode_ functions "internally".
> Fortunately, application developers are not completely left to fend for 
> themselves and be purview to all the details.  Technologies, such as _Spring 
> Data GemFire_, also consume and adhere to the _Geode_ public, Java API (and 
> +only+ the "public" Java API; "internal" classes  are not used given they are 
> subject to change), is able to handle this using Spring's robust bean 
> container lifecycle management features.  However, other application 
> consumers using the API will not fare as well.
> 2. Another problem stems from the poorly conceived and "imposed" ordering of 
> persistent Regions.
> For instance, if I have 2 Members, each defining 2 persistent Regions, for 
> which the Members are the "primary" for 1 of the 2 Regions and the 'other' 
> Member hosts the redundant copy, like so...
> Member    Regions
> -------------------------
> X               B, A'
> Y               A, B'
> Tick (') -  indicates member (e.g. X) is the primary for a particular Region 
> (i.e. A).
> Then, the system can result in a distributed deadlock due the non-apparent, 
> non-arbitrary dependency between the Members caused by an improper 
> configuration order of the Regions.
> In this situation, the primary Member for a Region must start before the 
> Member hosting the redundant Region copy (secondary) because it is a property 
> of _Geode" that the primary will have most recent, correct copy of the data.
> But, as I have illustrated above, when the system starts, and because I have 
> defined the Regions in an improper (arbitrary) order, this system will 
> deadlock.  I.e. when Member X starts, it will attempt to create Region B 
> first.  However, Member X must wait for Member Y to start since Member Y is 
> the "primary" for Region B.
> However, when Member Y starts, and because it tries to create Region A first, 
> it too will wait on Member X hosting the "primary" copy of Region A thereby 
> leading to a situation where each Member waits for the other and results in a 
> distributed deadlock.
> This example is pretty scaled and get more complex as you add Members and 
> additional Regions in a complex system.
> Of course, the "easy" solution is to ensure the Members in the cluster 
> declaring the Region all define the Regions in their configuration in the 
> "same order".  This is made even easier with the use of a cluster-wide, 
> shared configuration using the Cluster Configuration Service).  So by 
> defining all Regions in the same order on every Member (e.g. A followed by 
> B), then a developer/user can avoid the distributed deadlock.
> However, it is naive for _Geode_ to assume users will know/conform to this 
> restriction and impose an non-arbitrary order to workaround, basically, a 
> technical limitation of the code.
> In other environments, such as Spring, you cannot necessarily guarantee what 
> the order will be at runtime, especially if application components (e.g. 
> DAO's) inject references to GemFire components (e.g. Regions) along with 
> using in combination other advanced Spring container features like CLASSPATH 
> component-scanning to wire up the entire application.
> Even "collocation" has an impact on the Region creation order since Spring 
> must logically satisfy the "dependency" order of the beans first.  This is 
> both logical and makes sense, where as Geode's ordering is non-arbitrary and 
> non-apparent since any Member could host the redundant copy.  Therefore, this 
> problem is an implementation detail leaked.
> Technically, the same problem can be reproduced in {{cache.xml}} for that 
> matter with no Spring present.  And, this problem is especially more likely 
> to happen using the public Java API since again, there is no special *magic* 
> being handled by "internal" Geode classes (in this case) w.r.t. to 
> {{cache.xml}}.  Users/developers just have to know the correct ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to