Yesterday, the App Engine team hosted another block of its bimonthly IRC office hours. A transcript of the session and a summary of the topics covered is provided below. The next session will take place on Wednesday, July 15th from 9:00-10:00 a.m. PDT in the #appengine channel on irc.freenode.net.
SUMMARY: ---------------------------------------------------------------------- - Q: What are the performance implications of including large, multi- valued properties for individual entities? A: Adding several values to a list property should be fairly efficient, but as the number of values grows larger (e.g. several hundred or more), you may better off re-thinking your design since queries on that property will take longer (see the note on IN queries in http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Queries) and exploding indexes may start to become an issue (see http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes). For the hypothetical scenario where there is a many-to-many relationship between users and groups, you may consider using a third model with key properties for users and groups. Each entity therein represents a particular group that a user has joined, and hence it is fairly efficient to query for all groups that a user has joined or all users in a particular group without maintaining large multi-valued properties. [7:04-7:05, 7:23, 7:28, 7:37-7:38, 7:42-7:45, 7:48, 7:50-7:51] - If you see 'DownloadError: ApplicationError: 5', this generally means your URL Fetch operation did not complete within the expected 5- second window. You can adjust the deadline to 10 seconds for slower endpoints. You can catch this exception and retry, but slow the duration between retries if you receive the exception several times in a row. [7:10-7:16] - For best practices on debugging a Wave robot, see the Wave developer discussion group, but in general, if your App Engine application doesn't behave as expected or throws a strange error message, see the App Engine logs for your app to see the full stack trace, which may help in debugging the core problem. [7:11-7:12] - Q: Is there an option to pay for premium, around-the-clock support? A: No, not at this time. We're always looking at enhancing our support operations and improving our response time to issues filed in the discussion groups and issue tracker, but there is currently no premium support option. There are solutions providers that support App Engine that may be able to help: http://www.google.com/enterprise/marketplace/search?orderBy=rating&query=app+engine&offset=0 [7:13-7:21] - Q: Is there a ghs.google.com server located in China? It is currently blocked by the Chinese government A: Although we cannot say where our servers are located, we are aware of ongoing challenges in China, including the lack of a billing option. We are continually considering ways to improve the Chinese developer experience. [7:18, 7:23-7:25] - Issues with "100 URLMap exception" when attempting to deploy an application with numerous static files and servlets; we have seen several reports of this exception and are looking at http://code.google.com/p/googleappengine/issues/detail?id=1444 in more detail. Expect an update on this soon. If you experience the issue yourselves, please attach your web.xml file to the aforementioned issue. [7:20, 7:22-7:23, 7:29] - Issues with asynchronous URL Fetch and callbacks -- see http://code.google.com/p/googleappengine/issues/detail?id=1766 and http://groups.google.com/group/google-appengine-python/browse_thread/thread/0230d030d8407de7. If you're currently experiencing this issue, please star it. We'll do some testing and update it soon. [7:28, 7:30-7:36] - Optimizing datastore schema design for App Engine is a large topic. There are various tips scattered in many different places including the Google I/O videos and the recently published scaling series, and we are working on more comprehensive articles on the subject. [7:47, 7:50-7:51, 7:59-8:00] - Q: Since App Engine's datastore doesn't natively support aggregate functions like count and average or the traditional SQL GROUP BY clause, how can this be done? A: The new Task Queue API can greatly help with data aggregation for report generation, etc. but the best practice is performing aggregate calculations at write time rather than read time. For example, if you need a count of a particular resource, you can update a counter every time a new resource is added and the count is ready on demand. This works for other operations like average, etc. The task queue approach does not work well for on- demand, but is more useful for daily or weekly statistics, etc. [7:54, 7:57, 8:00-8:02] - The next Chat Time is Wednesday, July 15th, 9:00 a.m.-10:00 a.m. PDT. See you there! TRANSCRIPT ---------------------------------------------------------------------- [6:58pm] jason_google: Hi Everyone. Welcome to the latest Chat Time for App Engine. A few of us are here from Google, and we'll stick around for the next hour to answer any questions you have. [7:00pm] objectuser: I've been reading the articles about scalable apps. Is there any information about what the Datanucleus information does "under the covers" given various models? I'm particularly interested in the implications of having a list of keys on an entity, the performance of adding an item to the list and running a query against items in the list. Not really sure if this is a Datanucleus issue or datastore in general though. [7:02pm] jason_google: objectuser: This would be more of a broader datastore question. The DataNucleus component used by App Engine is specially tailored. [7:03pm] maxoizo: hi jason. When we can expect new java sdk? I very want taskqueue and localtools... [7:04pm] objectuser: cool. this was talked about a bit in the "modeling entity relationships" article, but i didn't know if there was an efficient way to handle list ... it seems like lists of keys are kind of "the way" to handle a lot of interrelated items. [7:05pm] jason_google: In general, I think the performance characteristics of adding to the list and querying against the list are pretty efficient given that this is natively supported by the underlying BigTable. That said, if you have a large number of values for a given multi-value property (several hundred, typically), you will start to see performance degradation, but it's negligible for small numbers of values. [7:05pm] jason_google: maxoizo: Soon, although the task queue will not be in the next release. [7:06pm] maxoizo: bad news [7:06pm] objectuser: is there a better way to handle relationships to thousands (or more) items? if a relationship entity is introduced ... that implies thousands of queries, right? [7:06pm] jason_google: objectuser: It's certainly a reasonable way to store some interrelated items, but once you pass several hundred, some datastore operations will be slower. [7:07pm] pablosaraiva: Hello [7:07pm] objectuser: thanks, jason. so what's the best way to handle things like that? [7:07pm] jason_google: pablosaraiva: Hi. [7:07pm] jason_google: objectuser: What kind of use case do you have in mind? [7:07pm] PabloSaraivaBR: Hi jason_google. [7:08pm] WholeBean: ji. I was getting HTTP Error 500: Internal Server Error last week [7:09pm] WholeBean: but it resolved itself though but i want to know how to avoid it in the future [7:09pm] scudder_google: WholeBean do you see the 500 errors in your logs? [7:09pm] maxoizo: jason_google, whether there are plans to include bdbdatastore in sdk? [7:10pm] WholeBean: i didn't look. i got the error when deploying [7:10pm] jason_google: maxoizo: No, it will be separate for now. [7:10pm] lstoll: (python) what exactly is an 'DownloadError: ApplicationError: 5 ' ? [7:10pm] WholeBean: i hope you guys would offer some premium support. [7:10pm] jason_google: Istoll: Where/when did you see this error? [7:10pm] kidcudi: lstoll i believe your urlfetch didn't respond within 5 seconds [7:11pm] lstoll: kidcudi: great, thanks [7:11pm] kidcudi: you can up the deadline to 10 seconds [7:11pm] PabloSaraivaBR: Hello. I have a Google Wave Robot hosted at appengine. How can I debug it? [7:11pm] kidcudi: http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html [7:11pm] kidcudi: check out deadline [7:11pm] scudder_google: PabloSaraivaBR: the easiest way might be to add information to the logs [7:11pm] lstoll: kidcudi: great. I was expecting to see a DeadlineError for that [7:12pm] lstoll: speaking of which, I see those on mail send's - is it find to just requeue and try again? [7:12pm] scudder_google: then you can check the admin console to to view them, search through them [7:12pm] jason_google: PabloSaraivaBR: Good question. That would be good to ask in the Wave group, since I don't have much experiencing developing for the platform. But like Jeff said, if you see the Wave not working correctly, the logs would be the best place to start looking. [7:12pm] jason_google: er, Jeff = scudder_google [7:12pm] PabloSaraivaBR: Nice! Thanks! [7:13pm] maxoizo: jason, thanx. WholeBean: I agree with you, premium support 24/7 is great [7:13pm] jason_google: Istoll: In general, yes, but I wouldn't keep trying if it fails several times in a row. [7:13pm] WholeBean: i'm definitely willing to pay! it sucks when we get errors and don't know anybody to turn to immediately. [7:14pm] WholeBean: i'm hosting my site at google and uptime is important. [7:14pm] lstoll: fair enough. What if I stick it in a queue to run, say 10 minutes later? Is a DeadlineError purely the app calling into whatever backend sends mail timing out, or could it be indicative of something else wrong? [7:15pm] jason_google: WholeBean: Well noted. We are looking at ways of making our support even better. [7:16pm] lstoll: oh, and +1 on the support thing. I think GAE is a great platform, but it's hard to consider for serious applications when one can't get assistance on issues. I understand that costs money, and that's fine. [7:16pm] jason_google: Istoll: Not 100% sure, but I'd guess that it could be either. But, regardless of the actual error, giving it some time between sending the message gives it more likelihood of it working. At least in my view. [7:17pm] WholeBean: thanks jason. i was able to get help from Nick Johnson (thanks Nick) but i'm looking for more immediate support. [7:18pm] juvenn: Hi Jason, is there ghs.google.com server located in China. You may know that ghs.google.com is blocked by GFW [7:18pm] jason_google: juvenn: I actually didn't know that. But I'm not able to say where the actual servers are located, unfortunately. [7:19pm] jcgregorio: WholeBean: Another option is to look to some of our solutions providers that support App Engine, http://www.google.com/enterprise/marketplace/search?orderBy=rating&query=app+engine&offset=0 [7:19pm] maxoizo: jason_google: about support 24/7: may be worth to make a ticket-system (with a guaranteed response, as example hour)? Each request will be cost, as example 1$. This system is widespread on hosting companies [7:20pm] JasmNK: (Java Question, but could also apply to Python) Recently I have been reaching the Maximum of 100 URLMap entities Exception when deploying. It started happening when I defined my static files in appengine-web.xml. My app is not that close to the 1000 File limit max, but I do have quite a few static files and a few servlets defined. What is the reason behind this limitation? [7:21pm] jason_google: maxoizo: The ticket system is certainly an interesting approach, and it's something that we may eventually consider, but it won't be right away. We are hopefully going to be rolling out other tools to improve our response time since I understand that it can be frustrating when something isn't working. [7:22pm] scudder_google: JasmNK: is each static file listed individually? If so, you could use fewer mappings by using regexes in your app.yaml file [7:22pm] juvenn: If the ghs.google.com is not guanranteed in Mainland China, it will be the toughest obstacles for Chinese developers to develop on AppEngine. So wouldn't you have any plans to get around this? [7:23pm] scudder_google: also for python request handlers you can use one handler script with multiple handler classes and use the webapp framework to do further routing inside of the script [7:23pm] objectuser: jason: So this is totally contrived. Sorry about that. Hope it makes some sense. [7:23pm] objectuser: jason: So this is totally contrived. Sorry about that. Hope it makes some sense. [7:23pm] objectuser: Let's say I'm organizing hundreds of thousands of people. I organize by group and each person, based on tons of pointless demographics, are in thousands of groups. And thousands of people can be in each group. I need to find what people are in a particular group (Fred and Martha are in group A) and what groups a person is in (Fred is in A and B). It seems like for this Fred would need a list of keys of groups he's in and then gro [7:23pm] JasmNK: scudder_google: Each file is not listed individually (in fact all I use are patterns, this is by the way a Java app, so it's appengine-web.xml). In fact when I make the pattern more generic (i.e. **.html to **.htm*), I seem to run into the problem [7:25pm] scudder_google: JasmNK: ah I see. The simplest workaround then might be to reduce the number of servlets and do some additional request routing in your Java code [7:25pm] maxoizo: jason_google: I think that the lack of guaranteed support deprives Appengine many corporate clients. I hope that the google team have to fix it:) [7:25pm] jason_google: juvenn: There are several obstacles in China, including the lack of a billing option. We are aware of some of these, and we have teams looking at various options. [7:28pm] Codys: Hey guys, I know a couple of people (including myself) have been having some problems with the async urlfetch feature in the latest python sdk. It seems that using a callback to process/store results in the datastore will cause an error. Was async urlfetch meant to be used in this way? [7:28pm] jason_google: objectuser: My first thought would be to say that you could use a separate table with the person key and group key, then query on an item in that table (where person == ... and group ==). If None is returned, then that person is associated with a group. You could use something similar to find all persons in a particular group and to find all groups that a particular person belongs to. I haven't thought this all the way through, so feel free to point out any flaws in my off-the-cuff design. [7:29pm] objectuser: understood. that would answer binary questions for sure. [7:29pm] JasmNK: scudder_google: It also seems to be affected by the URL pattern that I use for servlets too. i.e. (/main/servlet vs. / main/servlet/*). The problem with a request servlet is that I don't think I would get the benefits of the google static file servers. Is there a long-term solution to this issue? Because this is very limiting for someone like me who has to configure it in this manner to stay within the file limits already, and it seem [7:29pm] JasmNK: s to be an artificial limitation. Could there be a solution that would allow static files not to be under this limitation? [7:30pm] objectuser: but if you also needed the entities at the same time (all the users or all the groups ...) are you hosed? [7:30pm] jason_google: Codys: This is actually a feature I haven't experimented with much. Have you filed a bug with a reliable way to reproduce the error? [7:31pm] jason_google: Codys: I'd like to try this out myself. [7:31pm] Codys: jason_google: A similar issue has been logged here: http://code.google.com/p/googleappengine/issues/detail?id=1766 [7:32pm] Codys: jason_google: However, it does not deal with putting data into the datastore while in the callback. [7:33pm] jason_google: Codys: Cool, I'll take a look and update the status soon. [7:36pm] Codys: jason_google: This is specifically what I'm talking about: http://groups.google.com/group/google-appengine-python/browse_thread/thread/0230d030d8407de7 If you'd like, I'd be happy to file a bug for this particular issue. [7:36pm] jason_google: JasmNK: I'm interested in seeing your web.xml file. Have you posted this in the group? [7:37pm] objectuser: it just seems like, if you have some many to many relationships with more than a few hundred on each end, and queries need to return the items ... well, querying would be efficient but adding items to the list might be less so ... maybe there's a low- level api that makes adding items to a list efficient (ie, you don't have to load the whole list) or a different model ... [7:38pm] jason_google: objectuser: If you want to display a grid-like view of all users with all groups, then yes, this might be difficult. But if you had a lot of groups, this wouldn't be a good idea anyway. You'd be better off just showing a list of users and then showing all groups that the user belongs to when the user is clicked, etc. Don't know if that fits in with how you're picturing your UI. [7:39pm] JasmNK: jason_google: No I have not as of yet. I also have found this bug report related this issue http://code.google.com/p/googleappengine/issues/detail?id=1444 [7:40pm] objectuser: jason_google: i get what you're saying and I've rethought my model in those terms in a couple of places ... bringing back less information until a user drilled down on something. [7:41pm] jason_google: JasmNK: Thanks for the link. I'll definitely have a look at this soon since it appears to be affecting a number of users. If you can attach your web.xml file to that bug, that would be helpful. [7:42pm] objectuser: jason_google: just thinking of two separate use cases. maybe a user wants to see all his groups and an analyst wants to see all of a group's users. then you end up with thousands of items in two lists, one on the user entity and one on the group entity. [7:43pm] objectuser: jason_google: i'm not looking for a cartesian product, to be clear ... each query would just be interested in a list ... but a different list ... in this many-to-many scenario [7:44pm] jason_google: objectuser: Right, this doesn't seem like a good idea. But my earlier solution would seemingly work here, wouldn't it? It only fails when you want to see a view of all users and all groups together, but viewing info. for a particular user or group would be pretty straightforward if you had this new model. [7:45pm] objectuser: jason_google: i think these work at query time ... they're efficient (even if there is a transactional issue of adding users to one list and not another in the case of a failure), but, in general, is adding thousands (or more) of items to a list something that can be done efficiently? [7:45pm] jason_google: Codys: Yes, feel free to file a new bug. If it turns out to be the same issue, it will be marked as a duplicate, but if it could be something separate, adding a new bug is the safest way. [7:45pm] JasmNK: jason_google: I have attached both my appengine- web.xml and web.xml to bug report [7:46pm] Codys: jason_google: Alright, will do. Thanks for the help! [7:47pm] dennis_tw: i was looking at brett slatkin's 2009 google i/o talk on twitter followers and avoiding retrieving the list of followers. got me thinking that i should avoid passing large properties in my queries by putting them in separate entities. is that a reasonable thing to do when doing schema design? same with separating out frequently changing data so the more static data is not passed all the time. [7:48pm] dennis_tw: is this something google developers do? [7:48pm] jason_google: objectuser: Adding wouldn't be a problem in and of itself but you need to be careful to avoid exploding indexes which can appear. This is where adding large numbers of values for a given property can begin to cause issues. Since you want to query this property and since you expect a user to have potentially hundreds of groups, you might want to consider an alternative, especially if a given entity has more than one multi-valued property. [7:50pm] Codys: dennis_tw: Do you have a link for that talk (is it the Scalable, Complex Apps one)? [7:50pm] jason_google: dennis_tw: This isn't an uncommon pattern. Another use case for this is when trying to store large files in the datastore. Since entities can be at most 1 MB in size, you can split the binary data across multiple entities, storing a reference to the next entity in each, and then piece the data file together in the request. [7:50pm] jason_google: It's a different use case, but a similar technique. [7:50pm] objectuser: jason_google: i've seen references to exploding indexes but i've not spent time trying to understand them, so i'll do that. [7:50pm] dennis_tw: no link right now, but yes, scalable, complex apps. talks about twitter and social graph as egs. [7:51pm] jason_google: objectuser: Here's a good description: http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes [7:51pm] dennis_tw: ah, good to know! this seems like it will also help with datastore timeouts, right? [7:51pm] objectuser: jason_google: cool, thanks much! [7:52pm] objectuser: jason_google: i'll do some more reading now. i want to say i really appreciate your thoughtful answers to my questions. [7:52pm] jason_google: objectuser: You're very welcome. [7:53pm] jason_google: dennis_tw: Depending on how you take advantage of it, it can help, yes. [7:54pm] _mattd: im making an app that measures trending entities in some blogs i track. trends are measured by looking at the number of appearances an "entity" makes over a certain amount of time (60 minutes, 24 hours, 7 days, etc). coming to appengine from a normalized layout and rdbms that has SQL's "GROUP BY" as part of its feature set, im a little confused on how to dynamically group and display these listing. should i just be looking at a cron job that grabs all of the [7:54pm] _mattd: " within a certain date range and stores them in a cache or temp table for display? [7:56pm] dennis_tw: jason_google: any other sources of datastore design tips? i recall a message saying an article should be written but in the meantime do you know of any pointers. internally at google, are there some sources of tips (assuming google engrs use the same type of interface as gae) [7:57pm] jason_google: _mattd: You could do this, especially with the Task Queue API available to help you process data in small increments. You could also try to do this aggregation at write time. So if you need to calculate an average, you could just calculate the current average every time a new number is added, and when the average is requested, it's already available. This works well for counts also. [7:59pm] jason_google: dennis_tw: It's a very large topic and hard to summarize in one or two quick posts. But we are working on collecting these best practices in a series of articles, even though I realize this isn't as satisfactory. [8:00pm] jason_google: dennis_tw: We do have a few of these tips in the recent scaling series that was published. And others, which we hope to have in written form before too long, are in the Google I/O videos which it sounds like you're already aware of. [8:00pm] jason_google: dennis_tw: Here's the scaling series: http://code.google.com/appengine/articles/scaling/overview.html [8:00pm] _mattd: jason_google: w/r/t task queues, are you suggesting that i re-rank entities individually? fwiw, these entities (baseball players and organizations, to be specific) will be rising and falling in terms of popularity, so breaking it down that small seems like it could delay trend detection [8:00pm] dennis_tw: jason_google: thanks! [8:02pm] jason_google: _mattd: Determining popularly this way would probably be faster than using the task queue approach. The task queue approach would be better for daily or weekly report generation, but if you want real-time, you would have to calculate the aggregate data at runtime. [8:03pm] jason_google: OK everyone, we've reached the end of our Chat Time session. We'll be posting a transcript in the discussion group in the next day or so. Thanks for the great questions! [8:03pm] _mattd: jason_google: thanks! [8:03pm] maxoizo: bye google team, see you soon! [8:04pm] Codys: Later! [8:04pm] jason_google: The next Chat Time will be two Wednesdays from now, July 15th from 9:00-10:00 a.m. PDT [8:04pm] bthomson: thangs google guys --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---