On Feb 12, 2010, at 11:35 AM, Andrew Senin wrote:

> I worked with Igor on the GDS framework (although Igor knows more tech
> details than me). Let me put my two cents to the discussion.

Thanks!

> > 1. It looks like the main benefits of using the Google App Engine --
> specifically for MTT -- is that we can use the GDS and/or we can host an
> application on their web servers.  Is that correct?
> 
> I think yes. Also GDS should work faster than a relational DB on large
> amounts of data.

Cool.  The speed is also a good/important point for us -- our current SQL 
server is kinda creaking under the load.  Josh spent quite a bit of time 
optimizing the database that we have now (you should have seen how slow it used 
to be!), so moving to a faster platform is desirable.

> > 2. In reading through the Google Appengine docs, the GDS stuff looks like
> we mainly can access the data through GQL.  I don't see any mention of doing
> map/reduce kinds of computations (Ethan and I were talking on the phone
> today about MTT Appengine possibilities).  I'm new to all this stuff, so
> it's quite possible that a) I missed it, or b) I just don't understand what
> I'm seeing/reading yet.  Or does GQL do map/reduce on the back end to do its
> magic?  Is GQL the main/only way we have to access GDS?
> 
> As far as I and Igor know there are no way of doing Map/Reduce with GDS. And
> GQL (or filters which is practically synonym) is the main and only way to
> access GDS data.

Ok, good.  Just wanted to make sure we understood that point properly and 
weren't missing anything.

> > 3. Is there a reason that MTTGDS.pm doesn't use the python API to directly
> talk to GDS?  I.e., what is the rationale for using a web app on appengine?
> Is the web app doing stuff that we can't do at the client?  Ditto for
> bquery.pl and breport.pl.  (these questions are partially fueled by my
> curiosity and concern about why we're using so much CPU at Google)
> 
> There are a few reasons of doing it. The first is speed. When we post new
> data we firstly try to find if there is a copy of corresponding MpiInfo,
> ClustreInfo and other *Info classes. If we did it directly from client
> scripts the delays would be higher (depending on Internet connection speed).
> Price of it is additional CPU cycles on google servers.

FWIW, I don't think I'm concerned about the speed of submitting.  MTT runs can 
go for hours.  If it takes 2 seconds to submit or 20, I'm not concerned about 
it -- a few round-trip latencies + some GQL lookups are still a very small 
fraction of the overall MTT run time.  If CPU is going to be an issue, I 
wouldn't mind doing some of these lookups from the client (and potentially even 
caching some of the IDs on the client -- like we do on the SQL submission 
reporter), and then just submitting those IDs in the "main submit".

> The second and more
> important is that when we have such logic on server we (instead of GDS
> clients) are responsible for maintaining correct structure of links between
> objects. If such logic was implemented on client side user could (by mistake
> or on purpose) break links between objects.

Ah yes, this is a very good reason.

I would also imagine that without the web interface, we would be limited to 
talking to the GDS under a single username/password (i.e., the owner of the 
appspot), which is also undesirable.

Thanks for the info!

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to