[google-appengine] Re: Model datastore application

Anastasios Hatzis Wed, 13 Apr 2016 01:37:00 -0700

Susan,

as far as I understand your model and your procedures, Google Cloud 
Datastore should be a good choice. My suggestions are based on my 
experience with Python and the NDB API, but I assume there is no 
significant difference.

Since your app maintains hard limits in the number of friendships and 
friend requests, I suggest to tweak your model a little bit, so you can 
apply strong consistent queries (which require an ancestor, i.e. a parent). 
By doing this, we put the Friend objects into so-called entity groups, 
where each parent forms its own group.

FRIEND

-parent (Key of the owning user)
-id (auto-generated)
-friend1
-friend2
-status

This way, the app can perform a strongly consistent, keys-only query on 
both, number of friendships and number of pending friend requests. I guess, 
both queries would happen rather often (each time a friend request is 
created or friendship is accepted).

Furthermore, you already suggested two have two objects, one for each side 
of the friendship, good thinking. Since you already have the key of friend1 
as parent, we could remove it from the properties, too.

In addition, we could use the ID of the friend as ID of this relationship 
to avoid inconsistency by duplicate relationships. (I assume all User keys 
are parent-less, so the IDs already guarantee unique keys for all users in 
the datastore). In general, it is cheaper and faster to lookup for a key, 
than doing a query. This way, the existence of a key would tell us, if 
there is already a Friend object. The app also can compute the Friend key 
and directly get the object, rather than always first querying. And since 
we already have the friend's key (by the ID), we can also remove the 
property friend2.

FRIEND

-parent (Key of the friend1)
-id (ID of friend2's key)
-status

The status must be indexed though to perform the counts on requests and 
existing friendships. Any creation, update, and deletion of Friend objects 
would be in transactions across two groups (of friend1 and friend2), so 
both legs of the relationship are consistent.

In my experience using webapp2 for user authentication, it has benefits to 
keep the actual user account (the kind used for authentication) out of 
anything else in the app. So I would not use User keys as parent or for the 
friend key-property, but instead: computed keys with the same ID as the 
user (for uniqueness and consistency) and a different kind. Parents don't 
need to exist.

FRIEND

-parent (Key of kind FriendParent, ID the same as of corresponding user)
-id (ID of friend2)
-status

For example, two User objects stored, id:1 and id:2. As result of a friend 
request from 1 to 2, these two objects are created:

FRIEND- parent: KEY(FriendParent, 1)- id: 2- status: not_accepted_yet

FRIEND- parent: KEY(FriendParent, 2)- id: 1- status: new_request_to_accept

The parent keys don't exist as objects in the datastore, They only exist as 
parent keys in the Friend objects to put them into entity groups, so the 
app can apply strong consistent ancestor queries for each user.

Downside of the variant with parent: There is a technical limitation how 
often the app (or any other client) can write into the same entity-group 
(1/sec). In this model, it would give a hard limit how often each user can:

   - send/revoke friend request (or by the friend)
   - accept/deny request (or by the friend)
   - remove friend (or by the friend)

In other words: With parents in Friend, datastore can not maintain huge 
numbers of friendships, which implies high frequency of such write-ops, 
like followers at Twitter, channel subscriptions on YouTube etc. However, 
considering the rather low hard limits (200/100) I thought this constraint 
doesn't matter.

If it matters though, we cannot put Friend into bigger entity-groups, but 
then also loose the ability to perform strong consistent queries (which 
requires ancestor/parent). An eventually consistent ancestor-less query 
(for example, when the app counts number of requests for a user) may miss 
an entity that was written just milliseconds / seconds before, so it could 
be that the limit is slightly exceeded for some users. In that case, I 
would suggest to keep Friend in their own entity-group:

FRIEND

-id (similar to mutual_id: friend1_ID:friend2_ID)
-friend1
-friend2
-status

The app can still make the transactional writes with both. With this little 
tweak the app could at least get Friend by key in more use-cases than with 
an auto-generated ID.

I will continue my suggestion with the parent-version of Friend.

You have mentioned the display of names. As a general rule of thumb (and I 
would think that many Datastore users follow this rule), you do less data 
normalization than in SQL databases, in lack of join queries and such.

As far as it is only names, I would think that user names don't change 
frequently, so I would add the friend's name to the Friend model, so we 
don't need to query for the current name of up to 300 users every time we 
show the list of friends and friend requests (that would double the reads). 
If a user changes the name, we would need to update all Friend objects with 
this user. Given the parent-variant of Friend, we can do this strongly 
consistent. Perform an ancestor query of all Friend objects owned by this 
user (or the FriendParent object), compute the keys of the mirrored Friend 
objects, and batch update them to the new name in a few separate 
transactions.

As no limitations to the number of emails have been mentioned, this could 
be pretty heavy on writes. Maybe a few thousands emails to touch for each 
name change? And this would be needed for both, sender or recipient.

Furthermore, what about avatar images of users or other profile 
information? They may change more frequently. I think it's difficult to 
make forecasts on all the scenarios, so you could decide which approach 
would be cheaper. Probably it is safe to assume that the name and the 
avatar won't change often later on, so it makes sense to write them 
directly into Friend and maybe also EMail.

FRIEND

-parent (Key of kind FriendParent, ID the same as of corresponding user)
-id (ID of friend2)
-name (of friend2)
-imgUrl (of friend2)
-status

Every other profile-related data should be stored into a separate kind, 
especially if it can change frequently (last seen, online status etc.). In 
the HTML templates or with some JS wizardry, the link to each user's 
profile can be computed easily even in a friend / request list, with-out 
actually reading a User object. Basically, a Friend object would contain 
everything the app needs for the most frequent requests.

As I have mentioned earlier, I would separate the authentication-related 
data of a user from profile-related data, so instead of putting name, 
avatar etc. into the User kind, I would put it into a UserProfile kind, 
where the ID always is the same as of the corresponding user account.

USERPROFILE

-id (ID of USER)
-name
-imgUrl
-about me (etc.)
-status

One last note regarding the ID of USER, I suggest to not use the email ID 
which can change, but datastore keys (ID is part of the key) are immutable. 
An auto-generated ID would be fine.

For standard GAE environment there is the User API available, if you want 
to count on Google accounts or OpenID. I use a custom user management and 
authentication based on webapp2, but certainly, other frameworks also 
provide similar features. I've thought this is much safer and easier than 
implementing my own authentication features. There is so much that could go 
wrong.

Well, I hope this helped you a little.

Ani

On Wednesday, April 13, 2016 at 1:22:52 AM UTC+2, Susan Lin wrote:
>
>
>
> down votefavorite 
> <http://stackoverflow.com/questions/36585664/model-datastore-application#>
>
> I am looking how to create an efficient model which will satisfy the 
> requirements I put below. I have tried using gcloud-node but have noticed 
> it has limitations with read consistencies, references, etc. I would prefer 
> to write this is nodejs, but would be open to writing in java or python as 
> long as it would improve my model. I am building around the new pricing 
> model which will come July 1st.
>
> My application consists of a closed email system. In essence what happens 
> is users register to the site. These user's can make friends. Then they can 
> send emails to each other.
>
> *Components of the app:*
>
> Users - Unlimited amount of users can join.
>
> Friends - A User can have 200 confirmed friends and 100 pending friend 
> requests. When a friendlist is retrieved it should show the name of the 
> friend. (I will also need to receive the id of the friends so I can use it 
> on my client side to create emails).
>
> Emails - Users can send emails to their friends and they can receive 
> emails from their friends. The user can then view all their sent emails 
> independently(sentbox) and all their received emails independently(inbox). 
> They can also view the the emails sent between themselves and a friend 
> order by newest. The emails should show the senders and receivers names. 
> Once an email is read it needs to be marked as read.
>
> My model looks something like this, but as you can see their are 
> inefficiencies.
>
> *Datastore Kinds:*
>
> USER
> -email (id) //The email doesn't need to be the id, but I need to be able to 
> retrieve users by their email
> -hash_password
> -name
> -account_status
> -created_date
>
> FRIEND
> -id (auto-generated)
> -friend1
> -friend2
> -status
>
> EMAIL
> -id (auto-generated)
> -from
> -to
> -mutual_id
> -message
> -created_date
> -has_seen
>
> *Procedures of the application:*
>
> *Register* - Get operation to see if a user with this email exists. If 
> does not insert key.
>
> *Login* - Get operation to get user based on email. If exists retrieve 
> the hash_password from the entity and compare to user's input.
>
> *Send friend request* - Friend data will be written twice for every 
> relationship. Then using the index on friend1 and index on status I will 
> query all the friends for a user and filter only those which are 'pending'. 
> I will then count these friends and see if they are over X. Again I will do 
> this for the other user. If they are both not over the pending limit, I 
> will insert the friend request. This needs to run in a transaction.
>
> *Accept a friend request* - Friend data will be written twice for every 
> relationship. Then using the index on friend1 and index on status I will 
> query all the friends for a user and filter only those which are pending. I 
> will then count these friends and see if they are over X. Again I will do 
> this for the other user. If they are both not over the pending limit, I 
> will change both entities's status to accepted as a transaction.
>
> *Show confirmed friends* - Friend data will be written twice for every 
> relationship. Then using the index on friend1 and index on status I will 
> query all the friends for a user and filter only those which are accepted. 
> Not sure how I will show the friend's names (e.g what happens if a user 
> changed their name this needs to be reflected in all friend relationships 
> and emails!).
>
> *Show pending friends* - Friend data will be written twice for every 
> relationship. Then using the index on friend1 and index on status I will 
> query all the friends for a user and filter only those which are pending. 
> Not sure how I will show the friend's names (e.g what happens if a user 
> changed their name this needs to be reflected in all friend relationships 
> and emails!).
>
> *View sent emails* - Using the index on the from property I would query 
> to get all the sent emails from a user 5 at a time ordered by created_date 
> (newest first). (e.g what happens if a user changed their name this needs 
> to be reflected in all friend relationships and emails!).
>
> *View received emails* - Using the index on the to property I would query 
> to get all the received emails to a user 5 at a time ordered by 
> created_date (newest first). When a emails is seen it will update that 
> entities has_seen property to true. (e.g what happens if a user changed 
> their name this needs to be reflected in all friend relationships and 
> emails!).
>
> *View emails between 2 users* - Using the index on mutual_id which is 
> based on [lower_lexicographic_email]:[higher_lexicographic_email] to query 
> the mutual emails. Ordered by newest, 5 at a time. (e.g what happens if a 
> user changed their name this needs to be reflected in all friend 
> relationships and emails!).
>
> *Create email* - Using the friend1 and status index I will confirm the 
> user's are friends. If they are friends, I will insert an email.
>

-- 
HATZIS Edelstahlbearbeitung GmbH
Hojen 2
87490 Haldenwang (Allgäu)
Germany

Handelsregister Kempten (Allgäu): HRB 4204
Geschäftsführer: Paulos Hatzis, Charalampos Hatzis
Umsatzsteuer-Identifikationsnummer: DE 128791802
GLN: 42 504331 0000 6

http://www.hatzis.de/

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/1de70db8-b25f-41b6-a007-c97d79ff0ac8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: Model datastore application

Reply via email to