On Feb 12, 2010, at 11:35 AM, Andrew Senin wrote: > I worked with Igor on the GDS framework (although Igor knows more tech > details than me). Let me put my two cents to the discussion.
Thanks! > > 1. It looks like the main benefits of using the Google App Engine -- > specifically for MTT -- is that we can use the GDS and/or we can host an > application on their web servers. Is that correct? > > I think yes. Also GDS should work faster than a relational DB on large > amounts of data. Cool. The speed is also a good/important point for us -- our current SQL server is kinda creaking under the load. Josh spent quite a bit of time optimizing the database that we have now (you should have seen how slow it used to be!), so moving to a faster platform is desirable. > > 2. In reading through the Google Appengine docs, the GDS stuff looks like > we mainly can access the data through GQL. I don't see any mention of doing > map/reduce kinds of computations (Ethan and I were talking on the phone > today about MTT Appengine possibilities). I'm new to all this stuff, so > it's quite possible that a) I missed it, or b) I just don't understand what > I'm seeing/reading yet. Or does GQL do map/reduce on the back end to do its > magic? Is GQL the main/only way we have to access GDS? > > As far as I and Igor know there are no way of doing Map/Reduce with GDS. And > GQL (or filters which is practically synonym) is the main and only way to > access GDS data. Ok, good. Just wanted to make sure we understood that point properly and weren't missing anything. > > 3. Is there a reason that MTTGDS.pm doesn't use the python API to directly > talk to GDS? I.e., what is the rationale for using a web app on appengine? > Is the web app doing stuff that we can't do at the client? Ditto for > bquery.pl and breport.pl. (these questions are partially fueled by my > curiosity and concern about why we're using so much CPU at Google) > > There are a few reasons of doing it. The first is speed. When we post new > data we firstly try to find if there is a copy of corresponding MpiInfo, > ClustreInfo and other *Info classes. If we did it directly from client > scripts the delays would be higher (depending on Internet connection speed). > Price of it is additional CPU cycles on google servers. FWIW, I don't think I'm concerned about the speed of submitting. MTT runs can go for hours. If it takes 2 seconds to submit or 20, I'm not concerned about it -- a few round-trip latencies + some GQL lookups are still a very small fraction of the overall MTT run time. If CPU is going to be an issue, I wouldn't mind doing some of these lookups from the client (and potentially even caching some of the IDs on the client -- like we do on the SQL submission reporter), and then just submitting those IDs in the "main submit". > The second and more > important is that when we have such logic on server we (instead of GDS > clients) are responsible for maintaining correct structure of links between > objects. If such logic was implemented on client side user could (by mistake > or on purpose) break links between objects. Ah yes, this is a very good reason. I would also imagine that without the web interface, we would be limited to talking to the GDS under a single username/password (i.e., the owner of the appspot), which is also undesirable. Thanks for the info! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/