[ 
https://issues.apache.org/jira/browse/GEODE-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Blum updated GEODE-6032:
-----------------------------
    Description: 
Currently, if a cache client application is...

1) configured with a client {{PROXY}} Region, and...
2) the application is also using {{DataSerialization}} to de/serialize objects 
to/from the servers, and...
3) the application domain objects implement the {{org.apache.geode.Delta}} 
interface then Apache Geode will incorrectly send the entire object (again) 
when {{Delta.hasDelta}} returns *false*.

It is understandable that the application domain object needs to be serialized 
in its entirety the first time the object is sent to the server(s) (or if the 
object is later, subsequently removed or has expired/been evicted, and then 
needs to be re-added for whatever reason).

However, once the server(s) know about the object, then only ever a "delta" 
should be sent, and only when {{Delta.hasDelta()}} returns *true*.  Otherwise, 
if {{Delta.hasDelta()}} returns *false*, then most certainly, the entire object 
should not be sent again, otherwise it is possible for the application to enter 
a "_race condition_" scenario where the object gets "overwritten", and as a 
result, the application can lose data (aka "_lost updates_").

If users were to change their client Region data management policy from 
{{PROXY}} to {{CACHING_PROXY}} then this works as expected.  Apache Geode will 
only send for and object it already knows about if there is actually a "delta", 
otherwise Geode does nothing (that is, does not send the object, or rather any 
delta to the servers since there is technically nothing to send).

Obviously, in the {{CACHING_PROXY}} case, there is "local" state to compare 
against, and therefore, Geode knows about the object already, in that it 
"exists".  It can therefore assess the object to determine if it is the 
same/unchanged,  and not do anything in the case the {{Delta.hasDelta}} returns 
*false*, thus the "application" informing Geode there is nothing to send.

Clearly, in the {{PROXY}} case, this "local" state does not exist, and 
therefore, Geode does not know whether the object (already) exists on the 
servers or not.  So, if {{Delta.hasDelta()}} returns *false*, it is unsure 
whether the objects exists or not and so decides just to send the entire object 
again, a "_premature optimization_" to be sure, which now has sacrificed 
"_correctness_", and has amplified the possible "_race conditions_" on the 
application side.

However, this is no different than if {{Delta.hasDelta()}} returns *true* and 
the object is *not yet* known by the servers.  When the client sends just the 
delta in this case, the server will send back to the client, I don't know 
anything about this object for which the delta needs to be applied, and 
therefore, the client must turn around and send it the object anyway.

So, in the {{PROXY}} case, it would be better if the client made a 
determination about whether the object truly exists on the server side or not 
before arbitrarily and falsely assuming the entire object should be sent again 
if the {{Delta.hasDelta()}} returns *false*.  The client simply does not know 
and should "verify" before sending the object.

Obviously this affects performance, but is a small price to pay (and the 
"correct" thing to do) compared with "_lost updates_" and amplifying "_race 
conditions_", client-side.

There is also a situation where {{CACHING_PROXY}} client Regions can even 
*fail*, and that is when {{copy-on-read}} is set to *true*.

To make matters worse, even the 
[_Javadoc_|http://geode.apache.org/releases/latest/javadoc/org/apache/geode/Delta.html#hasDelta--]
 explains and implies that only "_pending changes_" are written if they exist...

> "Returns true if this object has pending changes it can write out."

Of course, this doc is less than clear and very ambiguous about what exactly 
happens.  But, to be sure, it is certainly not consistent in behavior when 
different data management policies are effect, and most definitely not correct!




  was:
Currently, if a cache client application is...

1) configured with a client {{PROXY}} Region, and...
2) the application is also using {{DataSerialization}} to de/serialize objects 
to/from the servers, and...
3) the application domain objects implement the {{org.apache.geode.Delta}} 
interface then Apache Geode will incorrectly send the entire object (again) 
when {{Delta.hasDelta}} returns *false*.

It is understandable that the application domain object needs to be serialized 
in its entirety the first time the object is sent to the server(s) (or if the 
object is later, subsequently removed or has expired/been evicted, and then 
needs to be re-added for whatever reason).

However, once the server(s) know about the object, then only ever a "delta" 
should be sent, and only when {{Delta.hasDelta()}} returns *true*.  Otherwise, 
if {{Delta.hasDelta()}} returns *false*, then most certainly, the entire object 
should not be sent again, otherwise it is possible for the application to enter 
a "_race condition_" scenario where the object gets "overwritten", and as a 
results, the application can lose data (aka "_lost updates_").

If users were to change their client Region data management policy from 
{{PROXY}} to {{CACHING_PROXY}} then this works as expected.  Apache Geode will 
only send for and object it already knows about if there is actually a "delta", 
otherwise Geode does nothing (that is, does not send the object, or rather any 
delta to the servers since there is technically nothing to send).

Obviously, in the {{CACHING_PROXY}} case, there is "local" state to compare 
against, and therefore, Geode knows about the object already, in that it 
"exists".  It can therefore assess the object to determine if it is the 
same/unchanged,  and not do anything in the case the {{Delta.hasDelta}} returns 
*false*, thus the "application" informing Geode there is nothing to send.

Clearly, in the {{PROXY}} case, this "local" state does not exist, and 
therefore, Geode does not know whether the object (already) exists on the 
servers or not.  So, if {{Delta.hasDelta()}} returns *false*, it is unsure 
whether the objects exists or not and so decides just to send the entire object 
again, a "_premature optimization_" to be sure, which now has sacrificed 
"_correctness_", and has amplified the possible "_race conditions_" on the 
application side.

However, this is no different than if {{Delta.hasDelta()}} returns *true* and 
the object is *not yet* known by the servers.  When the client sends just the 
delta in this case, the server will send back to the client, I don't know 
anything about this object for which the delta needs to be applied, and 
therefore, the client must turn around and send it the object anyway.

So, in the {{PROXY}} case, it would be better if the client made a 
determination about whether the object truly exists on the server side or not 
before arbitrarily and falsely assuming the entire object should be sent again 
if the {{Delta.hasDelta()}} returns *false*.  The client simply does not know 
and should "verify" before sending the object.

Obviously this affects performance, but is a small price to pay (and the 
"correct" thing to do) compared with "_lost updates_" and amplifying "_race 
conditions_", client-side.

There is also a situation where {{CACHING_PROXY}} client Regions can even 
*fail*, and that is when {{copy-on-read}} is set to *true*.

To make matters worse, even the 
[_Javadoc_|http://geode.apache.org/releases/latest/javadoc/org/apache/geode/Delta.html#hasDelta--]
 explains and implies that only "_pending changes_" are written if they exist...

> "Returns true if this object has pending changes it can write out."

Of course, this doc is less than clear and very ambiguous about what exactly 
happens.  But, to be sure, it is certainly not consistent in behavior when 
different data management policies are effect, and most definitely not correct!





> Entire object is serialized again (and again) when Delta.hasDelta returns 
> false and client is using PROXY Region
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-6032
>                 URL: https://issues.apache.org/jira/browse/GEODE-6032
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>            Reporter: John Blum
>            Priority: Critical
>
> Currently, if a cache client application is...
> 1) configured with a client {{PROXY}} Region, and...
> 2) the application is also using {{DataSerialization}} to de/serialize 
> objects to/from the servers, and...
> 3) the application domain objects implement the {{org.apache.geode.Delta}} 
> interface then Apache Geode will incorrectly send the entire object (again) 
> when {{Delta.hasDelta}} returns *false*.
> It is understandable that the application domain object needs to be 
> serialized in its entirety the first time the object is sent to the server(s) 
> (or if the object is later, subsequently removed or has expired/been evicted, 
> and then needs to be re-added for whatever reason).
> However, once the server(s) know about the object, then only ever a "delta" 
> should be sent, and only when {{Delta.hasDelta()}} returns *true*.  
> Otherwise, if {{Delta.hasDelta()}} returns *false*, then most certainly, the 
> entire object should not be sent again, otherwise it is possible for the 
> application to enter a "_race condition_" scenario where the object gets 
> "overwritten", and as a result, the application can lose data (aka "_lost 
> updates_").
> If users were to change their client Region data management policy from 
> {{PROXY}} to {{CACHING_PROXY}} then this works as expected.  Apache Geode 
> will only send for and object it already knows about if there is actually a 
> "delta", otherwise Geode does nothing (that is, does not send the object, or 
> rather any delta to the servers since there is technically nothing to send).
> Obviously, in the {{CACHING_PROXY}} case, there is "local" state to compare 
> against, and therefore, Geode knows about the object already, in that it 
> "exists".  It can therefore assess the object to determine if it is the 
> same/unchanged,  and not do anything in the case the {{Delta.hasDelta}} 
> returns *false*, thus the "application" informing Geode there is nothing to 
> send.
> Clearly, in the {{PROXY}} case, this "local" state does not exist, and 
> therefore, Geode does not know whether the object (already) exists on the 
> servers or not.  So, if {{Delta.hasDelta()}} returns *false*, it is unsure 
> whether the objects exists or not and so decides just to send the entire 
> object again, a "_premature optimization_" to be sure, which now has 
> sacrificed "_correctness_", and has amplified the possible "_race 
> conditions_" on the application side.
> However, this is no different than if {{Delta.hasDelta()}} returns *true* and 
> the object is *not yet* known by the servers.  When the client sends just the 
> delta in this case, the server will send back to the client, I don't know 
> anything about this object for which the delta needs to be applied, and 
> therefore, the client must turn around and send it the object anyway.
> So, in the {{PROXY}} case, it would be better if the client made a 
> determination about whether the object truly exists on the server side or not 
> before arbitrarily and falsely assuming the entire object should be sent 
> again if the {{Delta.hasDelta()}} returns *false*.  The client simply does 
> not know and should "verify" before sending the object.
> Obviously this affects performance, but is a small price to pay (and the 
> "correct" thing to do) compared with "_lost updates_" and amplifying "_race 
> conditions_", client-side.
> There is also a situation where {{CACHING_PROXY}} client Regions can even 
> *fail*, and that is when {{copy-on-read}} is set to *true*.
> To make matters worse, even the 
> [_Javadoc_|http://geode.apache.org/releases/latest/javadoc/org/apache/geode/Delta.html#hasDelta--]
>  explains and implies that only "_pending changes_" are written if they 
> exist...
> > "Returns true if this object has pending changes it can write out."
> Of course, this doc is less than clear and very ambiguous about what exactly 
> happens.  But, to be sure, it is certainly not consistent in behavior when 
> different data management policies are effect, and most definitely not 
> correct!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to