Well.. for one.. you are doing a datastore.query() instead of a db.query() Most all documentation on working with the datastore indicates to use db from google.appengine.ext instead of datastore from google.appengine.api.
Maybe there is a difference in how they perform in this context? Also, are you doing these tests on Appengine or in the Dev_appserver? (I'm presuming you're doing them on appengine live.. but just to be sure). On Mon, Oct 11, 2010 at 9:53 AM, Darshan Shaligram <[email protected]>wrote: > This is a followup query to my question on stackoverflow: > > http://stackoverflow.com/questions/3886341/is-appengine-python-datastore-query-much-3x-slower-than-java > > I've been evaluating the appengine to choose between Python and Java > and I noticed a large performance difference in datastore queries: > large queries are much slower in Python (by a factor of >3x) than in > Java. I'd like to confirm that this performance difference is known > behaviour, and not some mistake I'm making in my Python code. > > My test entity looks like this: > > Person > ====== > firstname (length 8) > lastname (length 8) > address (20) > city (10) > state (2) > zip (5) > > I populate the datastore with 2000 Person records, with each field > exactly the length noted here, all filled with random data and with no > fields indexed (just so the inserts go faster). > > I then query 1k Person records from Python (no filters, no ordering): > > q = datastore.Query("Person") > objects = list(q.Get(1000)) > > And 1k Person records from Java (likewise no filters, no ordering): > > DatastoreService ds = > DatastoreServiceFactory.getDatastoreService(); > Query q = new Query("Person"); > PreparedQuery pq = ds.prepare(q); > // Force the query to run and return objects so we can be sure > // we've timed a full query. > List<Entity> entityList = new > ArrayList<Entity>(pq.asList(withLimit(1000))); > > With this code, the Java code returns results in ~200ms; the Python > code takes much longer, averaging >700ms. Both apps are on the same > app id (with different versions), so they use the same datastore and > should be on a level playing field. > > I repeated the same test with much smaller fetches (fetch size 10-30) > and the small fetches show essentially the same performance for both > Python and Java, so the Python slowness affects only large fetches. > > > All my code is available here, in case I've missed any details: > http://github.com/greensnark/appenginedatastoretest > > > I also instrumented the sample apps with appstats (as suggested on > stackoverflow), and reran the tests (1k record fetch). Appstats > reports times like this "datastore_v3.RunQuery real=122ms api=9179ms" > for Java and times like "datastore_v3.RunQuery real=377ms api=9179ms" > for Python. I'm not entirely clear on how to read the appstats times. > > From my examination of the Python code in > google.appengine.api.datastore, it looks like most of the extra > slowdown in the Python code involves decoding the queried entities > from their protocol buffers, but I haven't benchmarked this to be > sure. > > Could anyone confirm if large datastore queries are just slower in > Python because Python is intrinsically slower than Java, or that my > code is broken in some way that's screwing with the performance in the > Python version? > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
