Hi, please let me know if this question is better suited for the dedicated 
Google Cloud Datastore group or some other online resource. However, I use 
Datastore in combination with GAE Python apps.

This week-end I have migrated one of my production apps and now have 
received user reports about some stale data in datastore they have 
discovered, while their corresponding documents in Search API are 
up-to-date. I'm still looking into the issue, but it seems that the 
datastore somehow jumped back in time for a few entities of a specific 
kind, maybe 0.5%, most of them where originally created in the same 
time-span of a few weeks in late November/early December 2015, though, not 
all of them in this time-span shown this issue.

Migration steps:

   1. Created a new project P2 (in EU)
   2. Deployed the Python code (version V1) to P2 with appcfg.py and waited 
   until all Datastore indexes were shown as "serving"
   3. Datastore backup in project P1 (in US), as of April 9th
      1. disabled writes for the datastore
      2. created a backup (all namespaces, all kinds, including a 
      "_DeferredTaskEntity"), using Cloud Console's "Datastore Admin" page, as 
      usual stored in my GCS bucket for backups
   4. In project P2, again with "Datastore Admin" page, right after backup 
   in P1 completed:
      1. disabled writes the almost empty datastore (none of them of the 
      kind that has shown the issue later)
      2. imported the same backup information from the backup bucket, and 
      restored into P2's datastore, again: all kinds, all namespaces
      3. when the restore tasks were completed, I enabled datastore writes
   5. Deployed the Python code (version V2) to P2 and did run a batch 
   handler that changed a property value of all entities, where each entity's 
   version counter is increased +1, the updated timestamp changes 
   automatically, and the corresponding search doc is updated, too.
   6. For Search API of P2: wiped all documents from all indexes in Search 
   API (just in case); when wipe tasks completed, queried the datastore 
   entities and wrote excerpts of them as search documents

Interestingly, for the effected entities of that kind, the corresponding 
search doc in Search API has more recent data than the original entity in 
datastore.

Datastore Entity in P1 and P2:

   - version counter: *8*
   - last update on: 2016-*02*-15
   - status: '*executing*'

Search document of this entity in P1 and P2:
(search doc ID is always the URLsafe encoded NDB key, and I can tell from 
all other fields/properties, it is the correct search doc)

   - version counter: *13*
   - last update on: 2016-*03*-15
   - status: '*completed*'


In P1 I had expected, that the entity has the same data than its search 
doc, but in fact was stale.

In P2, I have expected for both, entity and search doc:

   - version counter: *14*
   - last update on: 2016-*04-10*
   - status: '*completed*'

because of the migration script that updated one property for all entities 
in this kind, and should also have triggered an update of the search doc.

There are two observations:

   1. *P1's entity already had stale data*, older than the search doc. This 
   could be explained with an inconsistent / failed write to the datastore, at 
   least in theory. The app uses transactions for reading/writing of this 
   kind. In _post_put_hook(), *if future.check_success() is None*, the 
   search doc is written/updated. I can think of exotic situations where the 
   search doc could be older than its original entity in datastore, but since 
   the datastore write happens in a transaction, and the search export happens 
   only with a successful write operation, I fail to explain how the entity in 
   datastore could prevail the change (or revert to an older version). We talk 
   about 5 different types of changes during one month that have all been 
   lost. There are also no deferred tasks that write potentially old entities 
   back into the datastore.
   2. *P2's entity again shows stale data,* older than the search doc. This 
   is particularly confusing, because the search doc is only written with the 
   data read from datastore. And since the search docs were not copied from 
   P1, the only source was the data freshly restored from the P1 backup. 
   Although, if I look into the P1 datastore, as shown above, the data is 
   already stale. Where did P2's datastore then get the new data from? So 
   while the batch handler was running, the datastore had the data of version 
   counter 13, but at some point after writing the search doc, the datastore 
   reverted the entity to version counter 8. However, all the datastore writes 
   for this entity have happened long time ago in the original datastore of 
   P1. So, it looks to me, that the datastore in P2 somehow got both data for 
   this entity, version counter 8 *and* version counter 13. Wouldn't this 
   imply that the backup data could contain multiple versions of the same 
   entity, or could there be another leak that works across projects? And for 
   some reason, after the version counter 13 data was written to search docs, 
   the entity got reverted to version counter 8. 

I'm running out of possible explanations for this, other than Datastore is 
able to have multiple versions of the same entity and those are even part 
of a backup.

Paint me confused :) However, maybe you have any idea what could cause this.

Ani

-- 
HATZIS Edelstahlbearbeitung GmbH
Hojen 2
87490 Haldenwang (Allgäu)
Germany

Handelsregister Kempten (Allgäu): HRB 4204
Geschäftsführer: Paulos Hatzis, Charalampos Hatzis
Umsatzsteuer-Identifikationsnummer: DE 128791802
GLN: 42 504331 0000 6

http://www.hatzis.de/

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/8824117d-a7f3-44a2-92c4-8fa0e6116c59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to