Hey Martin, thanks for sharing your code! I will have a read through
it today. I initially wanted to try a very naive and simple approach.
Something like (I am using java btw):
class User {
@PrimaryKey
private String mUsername;
}
class Follow {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key mKey;
@Persistent
private String mUsernameFollowed;
@Persistent
private String mUsernameFollower;
}
So every time a user followers another user, a Follower instance is
created in the datastore. This should be pretty quick.
When I query a user, it will involve a few trips to the datastore
which is probably an awful idea, since reads are going to be frequent
(ie. viewing a user's profile page and seeing their first 20
followers, just like twitter does). This would involve:
1) Ask the datastore for the User object given the unique username.
2) Ask the datastore for the first 20 Follow records where
Follow.mUsernameFollowed = username.
3) Do a batch-get of those 20 User objects.
So, three reads here. This is probably going to be sub-optimal. as
users view one another's profiles, my datastore usage will climb
quickly, and the multiple reads will be slow.
So, I'll read through yours, this was a naive first thing that comes
to mind though!
Thanks
On Apr 19, 8:13 pm, Martin Webb <[email protected]> wrote:
> Mark - im working on a similar system myself. I agree that using a string in
> the user class is potentially going to throw issues on a large scale - app as
> it would need sharding? also running into 10,000 of followers could get
> heavy. I have made an implementation using a simple relationship model. and
> it can be used for anything. Friendships, followers, related images anything
> you like. I am not 100% sure it is the best model or even built correctly but
> i was considering posting it for comments. Its still in working so the code
> is not 100% tested ect - but it may give you some ideas.
> Any comments on this are much appreciated - as i have said i have looked at
> list properties but i can only imagine they would need sharding etc etc.
> I have not added any mcache support but this will be added to the finish
> class - i will post my code if anyone is interested.
>
> !/usr/bin/env python
> #
> # Copyright 2007 Google Inc.
> #
> # Licensed under the Apache License, Version 2.0 (the "License");
> # you may not use this file except in compliance with the License.
> # You may obtain a copy of the License at
> #
> # http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
> from google.appengine.ext import db
> import google.appengine.ext.db
> from google.appengine.api import memcache
> import shards
>
> class collection(db.Model):
> """
> Stores relationships - This could be achieved using a
> ListProperty(db.ReferenceProperty) but that may
> run into issues as it is possible that many "sessions" could be appending
> keys at the same time. Also a list
> property may have a performance issue if their are millions of siblings
> following for one entity.
>
> further-more we can mchache our keys for even faster performance when we
> check if keys are siblings of parent urls.
>
> I feel that this approach simple
> it may be - may be a better approach for long term performance?
>
> The collection object stores a domain, url and a key
>
> domain, url, key
>
> Where the domain and url describe the object and the key is the related
> sibling.
>
> domian='friend'
> url='martin'
> key=some instance key
>
> long_url = 'friend/martin'
> stored_key ='friend/martin/[some key]'
>
> We can store friendships by storing one side of the friendship as the url
> and the siblings (friends) as keys.
>
> the query to test if a sibling is a friend of a parent - is VERY simple:
> stored_key=make_key(domian, url,key)
> exists = collection.get_by_key_name(stored_key)
>
> This works as when we store a key the we store the key name as the
> domian+url+key; this is done using the make_key function
>
> We can memcache this - again using the stored_key as the cache key for
> super quick requests - when we are using the collection for say friendships
> user pages will need to know if the current user is a friend of the page
> owner - mcache will respond super fast in this scenario.
>
> Counters.
>
> We use our shard class for keeping count of how many instances are
> related to the url (ie how many friends to an entity)
>
> counter_key=long_url+'/cnt'
> shards.increment(counter_key)
>
> (note the format of our key )
>
> Getting a list of all the keys that are a sibling: (so all the friends of
> an entity)
>
> long_url=make_url('friends','martin') # 'friends/martin'
>
> all_siblings = db.GqlQuery('SELECT * FROM collection WHERE url = :1',
> long_url).get(limit) #paging removed
>
> We use a simple query that filters the url and returns the keys. As the
> keys are strings we don't need to worry about
> the data-store loading 'related instances'. Once we get our keys - we can
> create a short list of keys [] that can then make one
> call to the datastore and load the instances by key.
> This in theory should be super-quick as only the instances say 10 at a
> time are loaded for for queries where potentialy thousands of
> siblings may be present
>
> #Make a list of keys so that we can load our instances in a flash - in
> real world this might be 10 at a time (see limit in above query)
> li=[]
> for instance in keys:
> li.append(instance.key)
>
> #get a list of instances for the keys
> return db.get(li)
>
> The next question we could ask the datastore is:
>
> who am i a sibling of (who am i friends to)
>
> This can be returned using another simple query;
>
> all_instances = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND
> domain = ;2', sibling_key, 'friends').get(limit)
> note we use 'friends' as our domain as what to only return keys where the
> relationship is of a friendship
>
> Another example of use
>
> Lets say i want to create a group of people this can be model'd by
> creating a model for Groups:
>
> Class Groups(db.Model)
> Name: db.StringProperty()
>
> Now we can do:
>
> martin=user("martin")
> leo=user("leo")
>
> group=Group(key_name="Club1", Name="Club1")
>
> links.collection.add_sibling('groups',Club.Key(),martin)
> links.collection.add_sibling('groups',Club.Key(),leo)
>
> Note in this example we identify that the relationship is a group by
> using 'group' as our domain.
>
> now we can use the queries above to see "who" is in the group and if a
> person is a member of a group and what groups say martin is a member of;
>
> all_groups = db.GqlQuery('SELECT * FROM collection WHERE key = :1 AND
> domain = ;2', martin.key() , 'groups').get(limit)
>
> the above lists all the 'groups' martin is a member of.
>
> Friendships:
>
> martin=user("martin") #is the initiator
> leo=user("leo") #is the acceptor
>
> #adds the relationship
> links.collection.add_sibling('friends',leo.key(),martin.key())
> #if the relaitionship is too sided we simply reverse it
> links.collection.add_sibling('friends',martin.key(),leo.key()) #this may
> not be the best solution - we could simply create a query to find reversed
> relationships?
>
> the above queries can be used to get the relationships
>
> This class is useful in social apps where many relationships can be made.
>
> TO DO:
>
> 1. test
> 2. add mcache
> 3. add other in built queries as detailed in header outline (list
> siblings, who am i a sibling of, am i a sibling)
> 4. add remove relationship
>
> """
>
> """
>
> THE MODEL OBJECT
> domain, url, key
>
> the domain of the instance example 'freinds'
> """
> domain=db.StringProperty()
> """
> a url to describe our parent like friends/martin
> we make the url by using;
>
> url=make_key(domain,url,sibling)
>
> """
> url = db.StringProperty()
> """the key of our sibling instance"""
> sibling = db.StringProperty()
>
> @staticmethod
> def add_sibling(domain='',url=None, key=None):
> """
> Add a sibling key to a url's collection.
> the key should be like numberfile.key()
> the url should be a long url path like martin/friends but not
> include the key
> """
>
> """
> make the unique stored key which is the complete path long_url/key
> """
> stored_key=make_key(domain,url,key)
> exists = collection.get_by_key_name(stored_key)
> if exists is not None:
> # KEY is already added as a LINK
> return
>
> #add our new stored key
> long_url=make_url(domain,url)
>
> link = collection(key_name=stored_key)
> link.domain=domain
> link.url=long_url
> link.sibling=key
> link.put()
> #update our counter for the long_url ie friends/martin
> shards.increment(long_url +'_cnt')
>
> @staticmethod
> def get_siblings(domain,url, limit = 10, offset = None):
> """
> get all the sibling keys for a given doomain/url - identifier
> """
>
> li=[]
> """
> make the long url example friends/martin
> """
> long_url=make_url(domain,url)
>
> if offset is None:
> q = db.GqlQuery('SELECT * FROM collection WHERE url = :1',
> long_url)
> else:
> if type(offset) == type(str):
> offset = db.Key(offset)
> q = c.GqlQuery('WHERE url = :1 AND __key__ > :2', long_url,
> offset)
>
> #build our list
> for collection in q:
> li.append(collection.ref)
>
> """
> return all the instances for the keys in our list
> """
>
> return db.get(li)
>
> def make_url(domain='',url=None):
> """
> makes a long url path using the passed url and domain used for our
> key_names
> """
>
> if url is None:
> return None
> return domain+'/'+url
>
> def make_key(domain='',url=None,sibling=None):
> """
> makes a url using the passed url and key used for our key_names
>
> keys are make like this
>
> domain+url+key
>
> which for an example could be
>
> friends/martin/[key]
>
> which defines the key as a sibling of our long url - domain+url
>
> in our real world app using the key of the instance would be safer as our
> human readable word may change
> i.e. a persons name
> """
>
> if key is None:
> return None
>
> #build our long url path
> long_url=make_url(domain,url)
> #should we exit if long url is None? This
> ...
>
> read more »
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.