Yesterday, the App Engine team hosted another block of its bimonthly
IRC office hours. A transcript of the session and a summary of the
topics covered is provided below. The next session will take place on
Wednesday, July 15th from 9:00-10:00 a.m. PDT in the #appengine
channel on irc.freenode.net.

SUMMARY:
----------------------------------------------------------------------

- Q: What are the performance implications of including large, multi-
valued properties for individual entities? A: Adding several values to
a list property should be fairly efficient, but as the number of
values grows larger (e.g. several hundred or more), you may better off
re-thinking your design since queries on that property will take
longer (see the note on IN queries in
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Queries)
and exploding indexes may start to become an issue (see
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes).
For the hypothetical scenario where there is a many-to-many
relationship between users and groups, you may consider using a third
model with key properties for users and groups. Each entity therein
represents a particular group that a user has joined, and hence it is
fairly efficient to query for all groups that a user has joined or all
users in a particular group without maintaining large multi-valued
properties. [7:04-7:05, 7:23, 7:28, 7:37-7:38, 7:42-7:45, 7:48,
7:50-7:51]

- If you see 'DownloadError: ApplicationError: 5', this generally
means your URL Fetch operation did not complete within the expected 5-
second window. You can adjust the deadline to 10 seconds for slower
endpoints. You can catch this exception and retry, but slow the
duration between retries if you receive the exception several times in
a row. [7:10-7:16]

- For best practices on debugging a Wave robot, see the Wave developer
discussion group, but in general, if your App Engine application
doesn't behave as expected or throws a strange error message, see the
App Engine logs for your app to see the full stack trace, which may
help in debugging the core problem. [7:11-7:12]

- Q: Is there an option to pay for premium, around-the-clock support?
A: No, not at this time. We're always looking at enhancing our support
operations and improving our response time to issues filed in the
discussion groups and issue tracker, but there is currently no premium
support option. There are solutions providers that support App Engine
that may be able to help:
http://www.google.com/enterprise/marketplace/search?orderBy=rating&query=app+engine&offset=0
[7:13-7:21]

- Q: Is there a ghs.google.com server located in China? It is
currently blocked by the Chinese government A: Although we cannot say
where our servers are located, we are aware of ongoing challenges in
China, including the lack of a billing option. We are continually
considering ways to improve the Chinese developer experience. [7:18,
7:23-7:25]

- Issues with "100 URLMap exception" when attempting to deploy an
application with numerous static files and servlets; we have seen
several reports of this exception and are looking at
http://code.google.com/p/googleappengine/issues/detail?id=1444 in more
detail. Expect an update on this soon. If you experience the issue
yourselves, please attach your web.xml file to the aforementioned
issue. [7:20, 7:22-7:23, 7:29]

- Issues with asynchronous URL Fetch and callbacks -- see
http://code.google.com/p/googleappengine/issues/detail?id=1766 and
http://groups.google.com/group/google-appengine-python/browse_thread/thread/0230d030d8407de7.
If you're currently experiencing this issue, please star it. We'll do
some testing and update it soon. [7:28, 7:30-7:36]

- Optimizing datastore schema design for App Engine is a large topic.
There are various tips scattered in many different places including
the Google I/O videos and the recently published scaling series, and
we are working on more comprehensive articles on the subject. [7:47,
7:50-7:51, 7:59-8:00]

- Q: Since App Engine's datastore doesn't natively support aggregate
functions like count and average or the traditional SQL GROUP BY
clause, how can this be done? A: The new Task Queue API can greatly
help with data aggregation for report generation, etc. but the best
practice is performing aggregate calculations at write time rather
than read time. For example, if you need a count of a particular
resource, you can update a counter every time a new resource is added
and the count is ready on demand. This works for other operations like
average, etc. The task queue approach does not work well for on-
demand, but is more useful for daily or weekly statistics, etc. [7:54,
7:57, 8:00-8:02]

- The next Chat Time is Wednesday, July 15th, 9:00 a.m.-10:00 a.m.
PDT. See you there!


TRANSCRIPT
----------------------------------------------------------------------

[6:58pm] jason_google: Hi Everyone. Welcome to the latest Chat Time
for App Engine. A few of us are here from Google, and we'll stick
around for the next hour to answer any questions you have.
[7:00pm] objectuser: I've been reading the articles about scalable
apps.  Is there any information about what the Datanucleus information
does "under the covers" given various models?  I'm particularly
interested in the implications of having a list of keys on an entity,
the performance of adding an item to the list and running a query
against items in the list.  Not really sure if this is a Datanucleus
issue or datastore in general  though.
[7:02pm] jason_google: objectuser: This would be more of a broader
datastore question. The DataNucleus component used by App Engine is
specially tailored.
[7:03pm] maxoizo: hi jason. When we can expect new java sdk? I very
want taskqueue and localtools...
[7:04pm] objectuser: cool.  this was talked about a bit in the
"modeling entity relationships" article, but i didn't know if there
was an efficient way to handle list ... it seems like lists of keys
are kind of "the way" to handle a lot of interrelated items.
[7:05pm] jason_google: In general, I think the performance
characteristics of adding to the list and querying against the list
are pretty efficient given that this is natively supported by the
underlying BigTable. That said, if you have a large number of values
for a given multi-value property (several hundred, typically), you
will start to see performance degradation, but it's negligible for
small numbers of values.
[7:05pm] jason_google: maxoizo: Soon, although the task queue will not
be in the next release.
[7:06pm] maxoizo: bad news
[7:06pm] objectuser: is there a better way to handle relationships to
thousands (or more) items?  if a relationship entity is introduced ...
that implies thousands of queries, right?
[7:06pm] jason_google: objectuser: It's certainly a reasonable way to
store some interrelated items, but once you pass several hundred, some
datastore operations will be slower.
[7:07pm] pablosaraiva: Hello
[7:07pm] objectuser: thanks, jason.  so what's the best way to handle
things like that?
[7:07pm] jason_google: pablosaraiva: Hi.
[7:07pm] jason_google: objectuser: What kind of use case do you have
in mind?
[7:07pm] PabloSaraivaBR: Hi jason_google.
[7:08pm] WholeBean: ji. I was getting HTTP Error 500: Internal Server
Error last week
[7:09pm] WholeBean: but it resolved itself though but i want to know
how to avoid it in the future
[7:09pm] scudder_google: WholeBean do you see the 500 errors in your
logs?
[7:09pm] maxoizo: jason_google, whether there are plans to include
bdbdatastore in sdk?
[7:10pm] WholeBean: i didn't look. i got the error when deploying
[7:10pm] jason_google: maxoizo: No, it will be separate for now.
[7:10pm] lstoll: (python) what exactly is an 'DownloadError:
ApplicationError: 5 ' ?
[7:10pm] WholeBean: i hope you guys would offer some premium support.
[7:10pm] jason_google: Istoll: Where/when did you see this error?
[7:10pm] kidcudi: lstoll i believe your urlfetch didn't respond within
5 seconds
[7:11pm] lstoll: kidcudi: great, thanks
[7:11pm] kidcudi: you can up the deadline to 10 seconds
[7:11pm] PabloSaraivaBR: Hello. I have a Google Wave Robot hosted at
appengine. How can I debug it?
[7:11pm] kidcudi: 
http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html
[7:11pm] kidcudi: check out deadline
[7:11pm] scudder_google: PabloSaraivaBR: the easiest way might be to
add information to the logs
[7:11pm] lstoll: kidcudi: great. I was expecting to see a
DeadlineError for that
[7:12pm] lstoll: speaking of which, I see those on mail send's - is it
find to just requeue and try again?
[7:12pm] scudder_google: then you can check the admin console to to
view them, search through them
[7:12pm] jason_google: PabloSaraivaBR: Good question. That would be
good to ask in the Wave group, since I don't have much experiencing
developing for the platform. But like Jeff said, if you see the Wave
not working correctly, the logs would be the best place to start
looking.
[7:12pm] jason_google: er, Jeff = scudder_google
[7:12pm] PabloSaraivaBR: Nice! Thanks!
[7:13pm] maxoizo: jason, thanx. WholeBean: I agree with you, premium
support 24/7 is great
[7:13pm] jason_google: Istoll: In general, yes, but I wouldn't keep
trying if it fails several times in a row.
[7:13pm] WholeBean: i'm definitely willing to pay! it sucks when we
get errors and don't know anybody to turn to immediately.
[7:14pm] WholeBean: i'm hosting my site at google and uptime is
important.
[7:14pm] lstoll: fair enough. What if I stick it in a queue to run,
say 10 minutes later? Is a DeadlineError purely the app calling into
whatever backend sends mail timing out, or could it be indicative of
something else wrong?
[7:15pm] jason_google: WholeBean: Well noted. We are looking at ways
of making our support even better.
[7:16pm] lstoll: oh, and +1 on the support thing. I think GAE is a
great platform, but it's hard to consider for serious applications
when one can't get assistance on issues. I understand that costs
money, and that's fine.
[7:16pm] jason_google: Istoll: Not 100% sure, but I'd guess that it
could be either. But, regardless of the actual error, giving it some
time between sending the message gives it more likelihood of it
working. At least in my view.
[7:17pm] WholeBean: thanks jason. i was able to get help from Nick
Johnson (thanks Nick) but i'm looking for more immediate support.
[7:18pm] juvenn: Hi Jason, is there ghs.google.com server located in
China. You may know that ghs.google.com is blocked by GFW
[7:18pm] jason_google: juvenn: I actually didn't know that. But I'm
not able to say where the actual servers are located, unfortunately.
[7:19pm] jcgregorio: WholeBean: Another option is to look to some of
our solutions providers that support App Engine,
http://www.google.com/enterprise/marketplace/search?orderBy=rating&query=app+engine&offset=0
[7:19pm] maxoizo: jason_google: about support 24/7: may be worth to
make a ticket-system (with a guaranteed response, as example hour)?
Each request will be cost, as example 1$. This system is widespread on
hosting companies
[7:20pm] JasmNK: (Java Question, but could also apply to Python)
Recently I have been reaching the Maximum of 100 URLMap entities
Exception when deploying.  It started happening when I defined my
static files in appengine-web.xml.  My app is not that close to the
1000 File limit max, but I do have quite a few static files and a few
servlets defined.  What is the reason behind this limitation?
[7:21pm] jason_google: maxoizo: The ticket system is certainly an
interesting approach, and it's something that we may eventually
consider, but it won't be right away. We are hopefully going to be
rolling out other tools to improve our response time since I
understand that it can be frustrating when something isn't working.
[7:22pm] scudder_google: JasmNK: is each static file listed
individually? If so, you could use fewer mappings by using regexes in
your app.yaml file
[7:22pm] juvenn: If the ghs.google.com is not guanranteed in Mainland
China, it will be the toughest obstacles for Chinese developers to
develop on AppEngine. So wouldn't you have any plans to get around
this?
[7:23pm] scudder_google: also for python request handlers you can use
one handler script with multiple handler classes and use the webapp
framework to do further routing inside of the script
[7:23pm] objectuser: jason: So this is totally contrived.  Sorry about
that.  Hope it makes some sense.
[7:23pm] objectuser: jason: So this is totally contrived.  Sorry about
that.  Hope it makes some sense.
[7:23pm] objectuser: Let's say I'm organizing hundreds of thousands of
people.  I organize by group and each person, based on tons of
pointless demographics, are in thousands of groups.  And thousands of
people can be in each group.  I need to find what people are in a
particular group (Fred and Martha are in group A) and what groups a
person is in (Fred is in A and B).  It seems like for this Fred would
need a list of keys of groups he's in and then gro
[7:23pm] JasmNK: scudder_google: Each file is not listed individually
(in fact all I use are patterns, this is by the way a Java app, so
it's appengine-web.xml).  In fact when I make the pattern more generic
(i.e. **.html to **.htm*), I seem to run into the problem
[7:25pm] scudder_google: JasmNK: ah I see. The simplest workaround
then might be to reduce the number of servlets and do some additional
request routing in your Java code
[7:25pm] maxoizo: jason_google: I think that the lack of guaranteed
support deprives Appengine many corporate clients. I hope that the
google team have to fix it:)
[7:25pm] jason_google: juvenn: There are several obstacles in China,
including the lack of a billing option. We are aware of some of these,
and we have teams looking at various options.
[7:28pm] Codys: Hey guys, I know a couple of people (including myself)
have been having some problems with the async urlfetch feature in the
latest python sdk. It seems that using a callback to process/store
results in the datastore will cause an error.  Was async urlfetch
meant to be used in this way?
[7:28pm] jason_google: objectuser: My first thought would be to say
that you could use a separate table with the person key and group key,
then query on an item in that table (where person == ... and group
==). If None is returned, then that person is associated with a group.
You could use something similar to find all persons in a particular
group and to find all groups that a particular person belongs to. I
haven't thought this all the way through, so feel free to point out
any flaws in my off-the-cuff design.
[7:29pm] objectuser: understood.  that would answer binary questions
for sure.
[7:29pm] JasmNK: scudder_google: It also seems to be affected by the
URL pattern that I use for servlets too.  i.e. (/main/servlet vs. /
main/servlet/*).  The problem with a request servlet is that I don't
think I would get the benefits of the google static file servers.  Is
there a long-term solution to this issue?  Because this is very
limiting for someone like me who has to configure it in this manner to
stay within the file limits already, and it seem
[7:29pm] JasmNK: s to be an artificial limitation.  Could there be a
solution that would allow static files not to be under this
limitation?
[7:30pm] objectuser: but if you also needed the entities at the same
time (all the users or all the groups ...) are you hosed?
[7:30pm] jason_google: Codys: This is actually a feature I haven't
experimented with much. Have you filed a bug with a reliable way to
reproduce the error?
[7:31pm] jason_google: Codys: I'd like to try this out myself.
[7:31pm] Codys: jason_google: A similar issue has been logged here:
http://code.google.com/p/googleappengine/issues/detail?id=1766
[7:32pm] Codys: jason_google: However, it does not deal with putting
data into the datastore while in the callback.
[7:33pm] jason_google: Codys: Cool, I'll take a look and update the
status soon.
[7:36pm] Codys: jason_google: This is specifically what I'm talking
about: 
http://groups.google.com/group/google-appengine-python/browse_thread/thread/0230d030d8407de7
If you'd like, I'd be happy to file a bug for this particular issue.
[7:36pm] jason_google: JasmNK: I'm interested in seeing your web.xml
file. Have you posted this in the group?
[7:37pm] objectuser: it just seems like, if you have some many to many
relationships with more than a few hundred  on each end, and queries
need to return the items ... well, querying would be efficient but
adding items to the list might be less so ... maybe there's a low-
level api that makes adding items to a list efficient (ie, you don't
have to load the whole list) or a different model ...
[7:38pm] jason_google: objectuser: If you want to display a grid-like
view of all users with all groups, then yes, this might be difficult.
But if you had a lot of groups, this wouldn't be a good idea anyway.
You'd be better off just showing a list of users and then showing all
groups that the user belongs to when the user is clicked, etc. Don't
know if that fits in with how you're picturing your UI.
[7:39pm] JasmNK: jason_google: No I have not as of yet.  I also have
found this bug report related this issue
http://code.google.com/p/googleappengine/issues/detail?id=1444
[7:40pm] objectuser:  jason_google: i get what you're saying and I've
rethought my model in those terms in a couple of places ... bringing
back less information until a user drilled down on something.
[7:41pm] jason_google: JasmNK: Thanks for the link. I'll definitely
have a look at this soon since it appears to be affecting a number of
users. If you can attach your web.xml file to that bug, that would be
helpful.
[7:42pm] objectuser: jason_google: just thinking of two separate use
cases.  maybe a user wants to see all his groups and an analyst wants
to see all of a group's users.  then you end up with thousands of
items in two lists, one on the user entity and one on the group
entity.
[7:43pm] objectuser: jason_google: i'm not looking for a cartesian
product, to be clear ... each query would just be interested in a
list ... but a different list ... in this many-to-many scenario
[7:44pm] jason_google: objectuser: Right, this doesn't seem like a
good idea. But my earlier solution would seemingly work here, wouldn't
it? It only fails when you want to see a view of all users and all
groups together, but viewing info. for a particular user or group
would be pretty straightforward if you had this new model.
[7:45pm] objectuser: jason_google: i think these work at query
time ... they're efficient (even if there is a transactional issue of
adding users to one list and not another in the case of a failure),
but, in general, is adding thousands (or more) of items to a list
something that can be done efficiently?
[7:45pm] jason_google: Codys: Yes, feel free to file a new bug. If it
turns out to be the same issue, it will be marked as a duplicate, but
if it could be something separate, adding a new bug is the safest way.
[7:45pm] JasmNK: jason_google: I have attached both my appengine-
web.xml and web.xml to bug report
[7:46pm] Codys: jason_google: Alright, will do. Thanks for the help!
[7:47pm] dennis_tw: i was looking at brett slatkin's 2009 google i/o
talk on twitter followers and avoiding retrieving the list of
followers.  got me thinking that i should avoid passing large
properties in my queries by putting them in separate entities.  is
that a reasonable thing to do when doing schema design?  same with
separating out frequently changing data so the more static data is not
passed all the time.
[7:48pm] dennis_tw: is this something google developers do?
[7:48pm] jason_google: objectuser: Adding wouldn't be a problem in and
of itself but you need to be careful to avoid exploding indexes which
can appear. This is where adding large numbers of values for a given
property can begin to cause issues. Since you want to query this
property and since you expect a user to have potentially hundreds of
groups, you might want to consider an alternative, especially if a
given entity has more than one multi-valued property.
[7:50pm] Codys: dennis_tw: Do you have a link for that talk (is it the
Scalable, Complex Apps one)?
[7:50pm] jason_google: dennis_tw: This isn't an uncommon pattern.
Another use case for this is when trying to store large files in the
datastore. Since entities can be at most 1 MB in size, you can split
the binary data across multiple entities, storing a reference to the
next entity in each, and then piece the data file together in the
request.
[7:50pm] jason_google: It's a different use case, but a similar
technique.
[7:50pm] objectuser: jason_google: i've seen references to exploding
indexes but i've not spent time trying to understand them, so i'll do
that.
[7:50pm] dennis_tw: no link right now, but yes, scalable, complex
apps.  talks about twitter and social graph as egs.
[7:51pm] jason_google: objectuser: Here's a good description:
http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Big_Entities_and_Exploding_Indexes
[7:51pm] dennis_tw: ah, good to know!  this seems like it will also
help with datastore timeouts, right?
[7:51pm] objectuser: jason_google: cool, thanks much!
[7:52pm] objectuser: jason_google: i'll do some more reading now.  i
want to say i really appreciate your thoughtful answers to my
questions.
[7:52pm] jason_google: objectuser: You're very welcome.
[7:53pm] jason_google: dennis_tw: Depending on how you take advantage
of it, it can help, yes.
[7:54pm] _mattd: im making an app that measures trending entities in
some blogs i track. trends are measured by looking at the number of
appearances an "entity" makes over a certain amount of time (60
minutes, 24 hours, 7 days, etc). coming to appengine from a normalized
layout and rdbms that has SQL's "GROUP BY" as part of its feature set,
im a little confused on how to dynamically group and display these
listing. should i just be looking at a cron job that grabs all of the
[7:54pm] _mattd:  " within a certain date range and stores them in a
cache or temp table for display?
[7:56pm] dennis_tw: jason_google: any other sources of datastore
design tips?  i recall a message saying an article should be written
but in the meantime do you know of any pointers.  internally at
google, are there some sources of tips (assuming google engrs use the
same type of interface as gae)
[7:57pm] jason_google: _mattd: You could do this, especially with the
Task Queue API available to help you process data in small increments.
You could also try to do this aggregation at write time. So if you
need to calculate an average, you could just calculate the current
average every time a new number is added, and when the average is
requested, it's already available. This works well for counts also.
[7:59pm] jason_google: dennis_tw: It's a very large topic and hard to
summarize in one or two quick posts. But we are working on collecting
these best practices in a series of articles, even though I realize
this isn't as satisfactory.
[8:00pm] jason_google: dennis_tw: We do have a few of these tips in
the recent scaling series that was published. And others, which we
hope to have in written form before too long, are in the Google I/O
videos which it sounds like you're already aware of.
[8:00pm] jason_google: dennis_tw: Here's the scaling series:
http://code.google.com/appengine/articles/scaling/overview.html
[8:00pm] _mattd: jason_google: w/r/t task queues, are you suggesting
that i re-rank entities individually? fwiw, these entities (baseball
players and organizations, to be specific) will be rising and falling
in terms of popularity, so breaking it down that small seems like it
could delay trend detection
[8:00pm] dennis_tw: jason_google: thanks!
[8:02pm] jason_google: _mattd: Determining popularly this way would
probably be faster than using the task queue approach. The task queue
approach would be better for daily or weekly report generation, but if
you want real-time, you would have to calculate the aggregate data at
runtime.
[8:03pm] jason_google: OK everyone, we've reached the end of our Chat
Time session. We'll be posting a transcript in the discussion group in
the next day or so. Thanks for the great questions!
[8:03pm] _mattd: jason_google: thanks!
[8:03pm] maxoizo: bye google team, see you soon!
[8:04pm] Codys: Later!
[8:04pm] jason_google: The next Chat Time will be two Wednesdays from
now, July 15th from 9:00-10:00 a.m. PDT
[8:04pm] bthomson: thangs google guys
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to