Yesterday, the App Engine team hosted another block of its bimonthly IRC office hours. A transcript of the session and a summary of the topics covered is provided below. The next session will take place on Wednesday, June 17th from 9:00-10:00 a.m. PDT in the #appengine channel on irc.freenode.net.
------------------------------------------------------------------------------------------ - We may add support for additional payment platforms/systems down the road but other projects are a higher priority at the moment. If you want to see support added for a particular payment system, please file a new request in the issue tracker or star an existing request. [7:02, 7:07] - Some users are seeing 500 errors returned for a small percentage of requests, especially when under heavy load or after deployment. These are not reported in the error logs but may appear in the request-only logs. If these errors re-appear, please check your request logs and file a new report in the issue tracker with any relevant details such as your app's load at the time and the frequency of the error. A screenshot of the error page might help as well. [7:04, 7:16, 7:21] - Q: Any plans to publish an interface/documentation for the development server, e.g. allowing for the Jetty server to be started and stopped from within another programming environment? A: There are plans to open source the Java SDK soon which may help with this. [7:09 - 7:18, 7:25 - 7:33] - Q: Is this normal that posting (saving or updating) 20 root entities with no children takes about 2 seconds? A: Instead of adding entities serially, add them in a batch instead, reducing the number of round trips and shaving off at least half of the time. [7:19 - 7:22] - Q: How can one estimate storage space requirements? A: It's not straightforward right now -- in addition to raw data, metadata and indexes have to be stored which adds to the total storage needed by your application. With the latest release, you can disable indexes for single properties that you don't plan to query, which will save storage and make writes slightly faster. We plan to provide more specific information about how data storage size is calculated going forward. [7:33, 7:42 - 7:45] - Discussion on how to best go about querying for points within a bounding box in a geo-based application [7:36, 7:39 - 7:42] - SQL-like LIKE behavior can be approximated in App Engine by filtering on string prefix; see http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Indexes [7:45, 7:48] - Session videos for last week's Google I/O will be posted shortly. Included among these is a presentation on App Engine's new task queue system which will be rolled out in the not-too-distant future. We'll post to the blog when these are available. [7:59] ------------------------------------------------------------------------------------------ [7:00pm] Jason_Google: Hi Everyone. Welcome to the latest App Engine Chat Time! I and a few other Googlers will be here for the next hour to chat about all things App Engine. Fire away! [7:00pm] knowtheory: hope their having a good time at the conf though [7:00pm] dan_google: You can still ask Java questions, but Toby, Don and Max won't be here tonight because they're at JavaOne. [7:01pm] alexrudnick: dan_google: The Eclipse plugin team is represented! [7:01pm] dan_google: yay [7:01pm] knowtheory: yeh, well i'm a rubyist primarily, and only getting into the java stuff secondarily via JRuby [7:01pm] maxoizo: Hi google team. I have a question not only to the technical department, but to the financial department too. Please take a look at this link: http://code.google.com/p/googleappengine/issues/detail?id=1650, and say, is it possible in the near future or not? [7:01pm] knowtheory: 404? [7:02pm] knowtheory: oh comma fail [7:02pm] Jason_Google: maxoizo: Probably not the near future, although we do hope to expand the number of supported payment systems eventually. [7:02pm] dw: my only question would be about the status of xmpp [7:03pm] Jason_Google: dw: It's coming along nicely. No ETA just yet, however. [7:03pm] dw: excellent. thanks [7:03pm] Jason_Google: No problem. [7:04pm] bthomson: sometimes i get a 500 error from infrastructure that is not reported in the console under heavy load [7:04pm] bthomson: is it caused by not enough interpreters spun up or something? [7:05pm] maxoizo: Jason_Google: very very bad. And one more question - we want to transfer our big project to AppEngine, which possible [7:06pm] Jason_Google: bthomson: How often do you get these 500s? Do you see them only when your application hasn't gotten many requests for a period or randomly? [7:06pm] lidaobing: [java] the cron system is very unstable, for example: issue 1333 and issue 1252, can we have a better cron system soon? [7:07pm] Jason_Google: maxoizo: Right now, Google Checkout is the only supported payment platform, unfortunately. Please file an issue in the issue in the tracker for your payment platform of choice: http://code.google.com/p/googleappengine/issues/list [7:08pm] bthomson: Jason_Google: i don't have any percentage for you, but during load testing (ie, many requests are coming in) some small % of requests come back with this 500 error which is different from a python 500 error caused by the application [7:08pm] Jason_Google: lidaobing: These are both known bugs, yes. We don't have anyone here from the cron team but I know they're working on getting these issues addressed. [7:09pm] lidaobing: Jason_Google, thanks [7:09pm] knowtheory: [java] are you guys ever gonna publish the details/documentation for the dev server and/or app config stuff? [7:09pm] bthomson: it appears to be just one request that gets the error, not a cluster of requests in a particular timeframe [7:09pm] knowtheory: (i should watch that, i haven't checked recently if there have been doc updates) [7:09pm] knowtheory: doesn't look like it though [7:10pm] maxoizo: Jason_Google: we want to transfer our big project to AppEngine, witch possible will require more than 500 requests per second. Now project have cluster with 15 servers (tematics: ads - like AdSence/AdWords). We have some questions not only to technical side, but also TOS. Can i mail you in near time, cos i wantn't talk about this project in icq [7:10pm] Jason_Google: bthomson: Interesting. And you don't see any log messages indicating a datastore timeout or other error? [7:10pm] maxoizo: * in irc ^) [7:10pm] alexrudnick: knowtheory: What sorts of details would you like? Are there specific parts of the doc you'd like expanded? [7:11pm] knowtheory: i've been manually decompiling and digging through the dev server and appconfig code [7:11pm] knowtheory: I've been building a lib that provides a rubyesque interface to all that gear via JRuby [7:11pm] bthomson: Jason_Google, no, it's definitely not reported in the error console... the text screen is different from the text of a normal 500 (syntax error or Timeout or whatever) with tracebacks disabled as well [7:12pm] nickjohnson: knowtheory: Any reason you can't use the Python appcfg etc as reference, in that case? [7:12pm] lidaobing: Jason_Google, bthomson I also experience this problem when two request send at the same time [7:12pm] lidaobing: one of them will get a 500 very quickly [7:13pm] knowtheory: nickjohnson: i mean the actual details and classes of how they're structured [7:13pm] knowtheory: so the python docs probably serve as a good guide for the user facing stuff [7:13pm] nickjohnson: knowtheory: Are you trying to duplicate the tools in JRuby, or just ue them? [7:13pm] nickjohnson: er, use [7:13pm] bthomson: it's not a huge problem for ajax apps because I can just trap the 500 and rerequest, but it's kindof annoying [7:13pm] bthomson: lidaobing, glad to see i didn't imagine it [7:13pm] knowtheory: but i've actually got a setup which invokes and starts the Jetty server up via JRuby [7:14pm] knowtheory: rather than just shelling out to the scripts that are provided w/ the appengine java sdk [7:14pm] nickjohnson: bthomson: Have you tried checking with filter level "requests only"? It's possible they show up as a 500 with no stack trace, in which case they won't be logged as errors [7:14pm] nickjohnson: knowtheory: So what exactly are you trying to accomplish that requires more docs or source? [7:14pm] dan_google: knowtheory: If you're asking for the source of the SDK, we plan to release that soon. [7:14pm] dan_google: (the Java SDK I mean) [7:14pm] knowtheory: dan_google: yep that's what i wanted to know [7:14pm] knowtheory: cool [7:14pm] bthomson: nickjohnson, thank you I will check, I did not know that condition was possible [7:15pm] knowtheory: nickjohnson: so the goal is being able to do stuff with the server besides just run it. [7:16pm] nickjohnson: knowtheory: Can you be more specific than 'stuff'? [7:16pm] knowtheory: one really rudimentary example is being able to stat the file system locally to keep track of changes to a ruby app, and then restart the server to pick up on the changes [7:16pm] nickjohnson: Ah\ [7:16pm] dw: bthomson: up until a few months ago, i'd regularly see 500s around the time of a new deployment [7:16pm] knowtheory: (so yes, i can be more specific ) [7:16pm] nickjohnson: I'm not sure I understand why that requires insight into the server, though [7:16pm] dw: the 'no logs' kind which i think you're referring to [7:16pm] nickjohnson: Do you want to extend the server itself to do that? [7:17pm] knowtheory: nickjohnson: well no i can wrap the behavior around the server [7:17pm] bthomson: dw: it's kinda scary because you don't know how many users are getting errors and there's no record [7:17pm] knowtheory: this was more a question for future changes to the servers and other config gear [7:17pm] knowtheory: so that i can keep track of the changes more easily between sdk versions [7:18pm] nickjohnson: bthomson: That sort of behaviour is (obviously) a bug. Any information you can give us to help reproduce it is appreciated. [7:18pm] knowtheory: i can do the file system checking in ruby (and am happy to) [7:18pm] Jason_Google: bthomson: Did you check the request logs? [7:18pm] nickjohnson: knowtheory: Fair enough. As dan_google said, we do plan to release the source. [7:18pm] knowtheory: cool [7:19pm] maximity: Is this normal that posting (saving or updating) 20 root entities with no children takes about 2 seconds? Does this depend on number of the entity's properties? [7:19pm] bthomson: i will have to do some work to check the request logs because the application spews a ton of logging data and the problem only shows up under load [7:20pm] bthomson: if I make a thread with "bug report" you guys will see it right? [7:20pm] nickjohnson: maximity: Are you putting them serially, or in one batch? [7:20pm] Jason_Google: bthomson: Yes, definitely. [7:20pm] nickjohnson: The round-trip time is a substantial component of how long it takes to perform a datastore operation [7:20pm] maximity: ds.put(entity) in the for loop [7:20pm] nickjohnson: bthomson: You can also file a bug. [7:21pm] nickjohnson: maximity: Instead, accumulate entities to be updated and do a db.put() on the whole list at the end of the loop [7:21pm] nickjohnson: 1 round trip instead of 20. [7:21pm] bthomson: haha, my bad [7:21pm] maximity: thanks [7:21pm] Jason_Google: bthomson: Like Nick said, any info. you can provide in that report will help. If you only see it under heavy load (provide the rough load estimate), whether you only see if after you deploy. A screenshot of the error screen would help too even if it is generic. [7:21pm] dan_google: maximity: A batch put not only does only 1 round trip, but can update the different entity groups in parallel. [7:21pm] nickjohnson: Which will cut off about 50*19=0.95 seconds from your execution time [7:22pm] dan_google: maximity: If multiple entities in the same group and put in a batch, it'll be one save (one change to the entity group). [7:22pm] bthomson: Jason_Google: thanks, sure np, I will post it as a new bug if I see it happening during next load test [7:22pm] dw: do unsaved entities with no parent set go to the same entity group during a batch put? or are they considered individually [7:23pm] dw: oh, stupid question i guess. the group depends on their key [7:23pm] nickjohnson: I imagine the source IP of the 500 and a timestamp would help, too - submitted privately if you're concerned about privacy [7:23pm] Jason_Google: dw: I believe they're put as root entities. [7:23pm] dan_google: dw: unsaved entities with no parent set are each created in a new entity group. However, done in a batch the creates can occur in parallel. [7:23pm] nickjohnson: dw: Any entity with no parent is in its own entity group [7:23pm] nickjohnson: Heh. [7:25pm] knowtheory: nickjohnson: i just realized i did a terrible job of answering your question [7:25pm] knowtheory: So the goal is to be able to send signals to the server from within a jruby script [7:26pm] knowtheory: so that i can control stuff there, which requires some documentation/reverse engineering of how the server actuallyw orks [7:26pm] knowtheory: and what the methods to start/restart/stop it are and the like [7:26pm] nickjohnson: 'signals' other than what Java apps can already send to the server? [7:26pm] nickjohnson: Oh, right, you mean from an _external_ jruby script [7:26pm] knowtheory: well i mean, i pull the relevant jars and classes in to muck about with [7:26pm] nickjohnson: I think the standard sysv operations are your best bet there - sigterm, etc [7:27pm] knowtheory: but the interface provided by default is just via the shell script w/ the sdk [7:27pm] nickjohnson: But I can certainly understand the goal of extending the server to natively support monitoring the freshness of your ruby code [7:27pm] knowtheory: cool [7:28pm] knowtheory: yeah and i don't know, if there's ever additional stuff that gets included with the server, it'd be nice to be able to interrogate some of that stuff from ruby potentially (but that's again fairly vague at this point ) [7:28pm] knowtheory: I'm still exploring the world of java libraries, so i'm not an expert on what i can mix and match for interesting purposes [7:29pm] knowtheory: But the key really was the fact that the Jetty server and the classes provided with the SDK aren't something that are easily duplicated in ruby [7:29pm] knowtheory: partially just because of the fact that AppEngine behaves differently from the expectation of other ruby frameworks in a variety of ways [7:30pm] knowtheory: so trying to map the rubyist way of doing things to the appengine way of doing things just requires reading on my part and the like [7:30pm] knowtheory: so anything you guys can do to make that reading easier is much appreciated! [7:33pm] knowtheory: okay guys, i'm being invited to go crash some castles thanks for the chat! [7:33pm] maximity: How can one estimate the storage space requirements? [7:33pm] maximity: I recently uploaded a data from the text file which had original size about 1.5 MB (text ascii with delimiters) and the amount of stored data increased by roughly 70 MB [7:33pm] maximity: The file had about 7500 records converted stored in a single root Entity/Model with 7500 instances [7:34pm] bthomson: i think you can turn off single property indexes now, that might help [7:35pm] maximity: I have not configured any indexes yet [7:35pm] nickjohnson: maximity: Did your app have other activity during that period? Bear in mind that datastore usage is only updated authoritatively once a day [7:35pm] maximity: 99% not [7:35pm] Jason_Google: knowtheory: Have fun. [7:35pm] maximity: no requests [7:35pm] nickjohnson: So what you saw beforehand could be a (low) estimate, and what you saw after could be the updated figure for the entire day [7:35pm] maximity: I checked logs [7:36pm] maximity: hmm, it is not likely if I have not seen any requests in logs [7:36pm] cwvh: I'm currently looking at porting a PostGIS-based map app to GAE and currently hung up on what I'm going to do about querying for points within a bounding box. I've glossed over some clever tricks to get around the lack of GIS operators such as geohashing and list properties, but do any of the GAE gurus have any advice? [7:36pm] maximity: is there any way to check a size allocated to Entity? [7:36pm] Jason_Google: maximity: Single-property indexes are automatically created. You can disable these for individual properties that you don't plan to query to save some space. [7:36pm] nickjohnson: Not currently, no [7:36pm] dw: maximity: how many fields did each 'line' have, and did you have "indexed=False" in your model defs? [7:37pm] maximity: I used defaults [7:37pm] dw: (didn't i read somewhere we pay quota for single prop indexes?) [7:37pm] maximity: I would say about 40 fields [7:37pm] bthomson: ^^^ and also does disabling single prop indexes make puts faster? [7:37pm] maximity: most of them Double [7:38pm] dw: bthomson: it apparently does, i asked this a while back [7:38pm] bthomson: wow, that could be very useful [7:38pm] dw: test.. it might only be marginal [7:39pm] Jason_Google: cwvh: I've seen some impressive geo-based applications recently (I think one is being turned into a sample) but it's using techniques like geohashing. Because you can't have inequality filters on more than one property, that makes geo data somewhat hard to work with in GAE, but geohashing does a decent job. [7:39pm] nickjohnson: maximity: 40 fields is a reasonable number. The length of your field names can also have an impact, actually. [7:39pm] nickjohnson: cwvh: There are various GIS options available for App Engine currently, but none are as mature as something like PostGIS, currently [7:40pm] Jason_Google: bthomson: Yes, writes will be faster since not as many indexes need to be updated. [7:40pm] nickjohnson: Geohashing/Hilbert curve based approaches suit bigtable much better than tree based indexing, because they avoid seeks [7:40pm] nickjohnson: Or in the case of Bigtable, avoid multiple queries/lookups [7:40pm] cwvh: Jason_Google: I've been really intrigued by a technique of using successively less accurate lat/lon pairs in a property list and then using membership testing as means of quick (and not particularly accurate) culling.. is membership testing via property lists considered "dangerously" quick? [7:40pm] bthomson: thanks Jason_Google [7:41pm] cwvh: e.g., [(100.0135, x), (100.013, x), (100.01, x)] [7:41pm] nickjohnson: cwvh: That's one extant approach - Brett's Geobox library does this [7:41pm] nickjohnson: The other approach is to store a single number encoding both lat and long (hilbert curve / geohashing) and use a range query to retrieve the contents of a bounding box [7:42pm] Jason_Google: cwvh: It depends on how many items are in the list. If you have any other list properties in your kind, particularly ones that you need to use in your queries, indexes could start to be a problem. [7:42pm] maximity: as I said most of the fields became numbers stord as Double, a few Stings, but nothing crazy about 200 charactes per record i.e. split across these 40 properties [7:42pm] bthomson: if maximity had 40 properties and there are 40 indexes, then increase of size 70x seems not unreasonable [7:42pm] nickjohnson: The former uses only equality queries, but requires more index entries; the latter only one index entry, but requires you to use your one inequality filter [7:43pm] nickjohnson: maximity: To store a single property in an entity requires the length of the property (8 bytes in the case of a double), plus the length of its name (for every entity), plus some bytes of overhead [7:43pm] dw: im a little surprised we're charged for field name storage on a per entity basis? [7:43pm] dw: would have thought a number was used internally, or something [7:43pm] Jason_Google: maximity: We are planning to add some more documentation on how to estimate datastore storage requirements since I think a lot of developers would appreciate this. This is on my plate, actually. [7:44pm] nickjohnson: dw: The datastore is schemaless; there's no other way to allow arbitrary field names for arbitrary columns [7:44pm] nickjohnson: The lower level API works more or less like a dictionary [7:44pm] nickjohnson: (For each entity, that is) [7:45pm] maximity: thanks guys it will help us a lot [7:45pm] maximity: Can you offer any advice for implementing a query similar to SQL -> LIKE ‘A%’ incompatible encoding [7:45pm] maxoizo: Some stupid question: your roadmap to be fulfill by the end of June? Or is it a rough time? [7:45pm] dw: nickjohnson: it makes sense, i guess. i assumed that / somewhere/ where was an authoritative list of whats in use for an app [7:45pm] nickjohnson: maximity: See the docs: http://code.google.com/appengine/docs/python/datastore/queriesandindexes.html#Introducing_Indexes [7:45pm] nickjohnson: The note in that section describes how to implement a prefix query with inequalities [7:46pm] Jason_Google: maxoizo: It's rough. There might be a few things that slip a bit, but everything on that list is being actively worked on now. [7:46pm] nickjohnson: dw: The protocol buffers we use for data storage are open-source - you can see exactly how we store it. [7:46pm] maxoizo: Jason_Google: thanks [7:47pm] dw: i assumed you'd be assigning field tags for each unique name, kind of like an atom table (thats perhaps a windows-specific term) [7:47pm] maximity: I think I read this doc before, I believe App Engine supports only basic operators unless I missed somehting [7:47pm] maximity: The filter operator can be any of the following: [7:47pm] maximity: < less than [7:47pm] maximity: <= less than or equal to [7:47pm] maximity: = equal to [7:47pm] maximity: > greater than [7:48pm] maximity: >= greater than or equal to [7:48pm] maximity: != not equal to [7:48pm] maximity: IN equal to any of the values in the provided list [7:48pm] nickjohnson: Please don't paste large blocks of text in here [7:48pm] maximity: sorry [7:48pm] nickjohnson: There's a 'Tip' section at the end of the section I linked you to, that describes how to implement a filter for a string prefix using > and < [7:49pm] nickjohnson: By making use of the order in which strings are sorted [7:55pm] dan_google: Anyone here playing with Wave? [7:55pm] dw: give us xmpp and we might [7:55pm] nickjohnson: dw: You can write Wave bots without needing XMPP [7:55pm] dan_google: An XMPP->Wave robot? [7:55pm] dw: i'd love an account on the demo system, but i guess they're as scarse as hen's teeth [7:55pm] maximity: hmm, I have not received account info yet [7:55pm] nickjohnson: dan_google: I think he's referring to the fact that Wave uses XMPP [7:56pm] dan_google: nickjohnson: Wave uses XMPP? [7:56pm] dw: *scarce [7:56pm] cwvh: have the sandbox accounts started to trickle out? [7:56pm] dw: server<->server is based on XMPP, AFAIK [7:56pm] dan_google: To I/O attendees for now, yes [7:56pm] nickjohnson: dw: Right [7:56pm] dan_google: oh that bit [7:56pm] dan_google: right [7:56pm] Jason_Google: dw: Google I/O attendees will be the first to get accounts, but many of them are still waiting for credentials. I think it will be a bit longer before the external developer community at large gets an account, but we'll see. [7:56pm] nickjohnson: Running out of official dev chat time - if you have a question, ask now! [7:56pm] maximity: hmm, I went to I/O and sent request but have not received anything back [7:56pm] dw: with some very strange looking non-uri identifiers, but it's a preview, and i see sam ruby noticed this fact as well [7:57pm] Jason_Google: maximity: In that case, you should hear back soon. [7:57pm] Jason_Google: They're still working on it, AFAIK. [7:57pm] dan_google: dw: Wave<->App Engine already works, which is why I mentioned it here. [7:58pm] maxoizo: To java team: Will we expect in the near reliase to support backgroundtask? Or only after? I know that python will support this on next+ week [7:58pm] dan_google: maxoizo: Do you mean the Task Queue API? [7:58pm] dw: ah.. very quick question.. i saw mention of 'better than cron' background tasks. does this mean an appengine mapreduce-alike? [7:59pm] dan_google: dw: Not quite. We're about to launch an (experimental) task queue service. [7:59pm] dw: aha. thanks. [7:59pm] nickjohnson: dw: Look out for the video of the I/O talk on said queues, out soon. [7:59pm] bthomson: sounds exciting [7:59pm] Jason_Google: By end of week. There was a presentation on it. [7:59pm] Jason_Google: bthomson: Oh, it is. [8:00pm] maxoizo: dan_google: yes [8:00pm] Jason_Google: OK, we've reached the end of Chat Time. Thanks for joining in! The next one will be in two weeks, June 17th, 9-10 a.m. PDT. [8:01pm] cwvh: thanks for the Q&A session guys ~ [8:01pm] bthomson: thanks for chat! [8:01pm] dw: thanks all [8:01pm] Jason_Google: You're very welcome. Have a very good evening (or morning depending). [8:01pm] maxoizo: Thanks appengine team! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
