[google-appengine] Re: Using transactions to avoid stale memcache entries.

Andy Freeman Sat, 10 Oct 2009 07:35:47 -0700

> Update memcache after the transaction completes. There's still the
> possibility that your script could fail between the two events,


Updating memcache after the transaction completes can result in
persistently inconsistent memcache data even if there's no script
failure.  Consider:

def txn(key):
    a = db.get(key)
    if not a: return None
    a.count += 1
    a.put()
    return a
a = db.run_in_transaction(txn, key)
if a:
    memcache.set(str(a.key()), a)

Even if there are no script failures, the order that different
processes finish the transaction is not guaranteed to be the same as
the order that those processes do the memcache.set.  That
inconsistency lasts until the memcache data timesout.  (IIRC, there's
actually no guarantee that memcache data is flushed when the timeout
expires.)

> but there's
> no avoiding that without transactional semantics between the datastore and
> memcache.

While such transactional semantics between memcache and datastore
would be sufficient, I don't think that they're necessary to satisfy
my requirement.  My existence argument for "can satisfy requirement
without transactional semantics" is the implementation that I
provided.  It only requires consistency checks at datastore operations
and that I address three specific script failures.  (Note that all
datastore operations after the one that runs into the conflict will be
rolled back/ignored, so there's a cost to delaying the check until
commit.  That said, I don't know if doing the consistency check once
at commit is signficantly cheaper than doing it incrementally at each
datastore operation.)

The script failures that I need to address are machine, deadline, or
programming problems after/during the memcache.set and before the
commit.  The last problem is under my control and I think that I've
got a handle on deadlines.  I have to live with machine errors
everywhere else, so ....

Datastore transactions are the only tool that I have to constrain the
order of operations in different processes.  I'd like them to be as
powerful as possible.


On Oct 9, 9:53 am, "Nick Johnson (Google)" <[email protected]>
wrote:
> Hi Andy,
>
> On Fri, Oct 9, 2009 at 5:08 PM, Andy Freeman <[email protected]> wrote:
>
> > > They are raised inside a transaction, when a conflict is detected with
> > > another concurrent transaction. The transaction infrastructure will catch
> > > and retry these several times, and only raise it in the external code if
> > it
> > > was unable to execute the transaction after several retries.
>
> > Yes, but when are conflicts checked?  Specifically, is the error
> > always raised by the statement in the user function that runs into the
> > conflict or can it be raised later, say during transaction commit.
>
> Any datastore operation inside a transaction could raise this exception. It
> would be a bad idea to rely on _where_ this exception will be raised.
>
>
>
>
>
>
>
> > I've looked at the SDK's implementation of
> > RunInTransactionCustomRetries (in google/appengine/api/datastore.py).
> > The except that catches the CONCURRENT_TRANSACTION exception protects
> > the commit and not the execution of the user function.  That suggests
> > that the user function is run to completion regardless of conflicts
> > and that the conflict isn't acted upon until a commit is tried.
>
> > However, your description and the documentation suggests the real
> > implementation detects and acts on conflicts while running the user
> > function.
>
> > Here's a user function which demonstrates the difference.  (Yes, I
> > picked an example that I care about.  I'm trying to ensure that
> > memcache data is "not too stale".)
>
> > def txn():
> >    ...
> >    a.put()
> >    memcache.set('a', a.field)
> >    return a
>
> > If the CONCURRENT_TRANSACTION exception is raised while txn is being
> > run, specifically during a.put(), the memcache.set won't happen when
> > db.run_in_transaction(txn) fails.  If that exception is raised after
> > txn has exited and during commit (as the SDK code suggests), the
> > memcache.set will happen whether or not db.run_in_transaction(txn)
> > fails.
>
> > If my understanding of the SDK code is correct and the real
> > implementation works the same way, namely that conflicts are detected
> > after the user function completes, how can I ensure that memcache data
> > is not too stale?  (One way is to have that data expire reasonably
> > quickly, but that reduces the value of memcache.)
>
> Update memcache after the transaction completes. There's still the
> possibility that your script could fail between the two events, but there's
> no avoiding that without transactional semantics between the datastore and
> memcache.
>
> > Also, what's the definition of "conflict"?  Clearly there's a conflict
> > between a user function that reads a given data store entity and one
> > that writes the same entity.  However, what about the following?
>
> > def txn1(a, b):
> >    # notice - no read for a or b
> >    a.put()
> >    b.put()
> >    return True
>
> > Does the conflict detection system detect the conflict between
> > transactions with txn1 for the same datastore entities?
>
> Yes.
>
>
>
>
>
> >  (The intent
> > of transactions with txn1 is to ensure that a and b are mutually
> > consistent in the datastore.)
> > Speaking of "definitions of conflict", suppose that conflicts actually
> > are detected/handled while the user function is being run, so that txn/
> > txnw can not leave the datastore and memcache inconsistent for very
> > long.  Are txnw and txnr (below) seen as conflicting given the same
> > key?  (They're not conflicting as far as the datastore is concerned,
> > but remember - I'm trying to keep memcache consistent as well.)
>
> > def txnw(key, new_value):
> >    v = db.get(key)
> >    v.field = new_value
> >    db.put(v)
> >    memcache.set(str(key), v.field)
> >    return True
>
> > def txnr(key):
> >    v = db.get(key)
> >    memcache.set(str(key), v.field)
> >    return True
>
> > Thanks,
> > -andy
>
> > On Oct 9, 4:45 am, "Nick Johnson (Google)" <[email protected]>
> > wrote:
> > > Hi Andy,
>
> > > On Tue, Oct 6, 2009 at 8:45 PM, Andy Freeman <[email protected]>
> > wrote:
>
> > > > Short version.
>
> > > > When, exactly, are apiproxy_errors.ApplicationErrors
> > > > with .application_error ==  datastore_pb.Error.CONCURRENT_TRANSACTION
> > > > raised.
>
> > > They are raised inside a transaction, when a conflict is detected with
> > > another concurrent transaction. The transaction infrastructure will catch
> > > and retry these several times, and only raise it in the external code if
> > it
> > > was unable to execute the transaction after several retries.
>
> > > -Nick Johnson
>
> > > --
> > > Nick Johnson, Developer Programs Engineer, App Engine
> > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > Number:
> > > 368047
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine
> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> 368047- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Using transactions to avoid stale memcache entries.

Reply via email to