This is a followup query to my question on stackoverflow:
http://stackoverflow.com/questions/3886341/is-appengine-python-datastore-query-much-3x-slower-than-java
I've been evaluating the appengine to choose between Python and Java
and I noticed a large performance difference in datastore queries:
large queries are much slower in Python (by a factor of >3x) than in
Java. I'd like to confirm that this performance difference is known
behaviour, and not some mistake I'm making in my Python code.
My test entity looks like this:
Person
======
firstname (length 8)
lastname (length 8)
address (20)
city (10)
state (2)
zip (5)
I populate the datastore with 2000 Person records, with each field
exactly the length noted here, all filled with random data and with no
fields indexed (just so the inserts go faster).
I then query 1k Person records from Python (no filters, no ordering):
q = datastore.Query("Person")
objects = list(q.Get(1000))
And 1k Person records from Java (likewise no filters, no ordering):
DatastoreService ds =
DatastoreServiceFactory.getDatastoreService();
Query q = new Query("Person");
PreparedQuery pq = ds.prepare(q);
// Force the query to run and return objects so we can be sure
// we've timed a full query.
List<Entity> entityList = new
ArrayList<Entity>(pq.asList(withLimit(1000)));
With this code, the Java code returns results in ~200ms; the Python
code takes much longer, averaging >700ms. Both apps are on the same
app id (with different versions), so they use the same datastore and
should be on a level playing field.
I repeated the same test with much smaller fetches (fetch size 10-30)
and the small fetches show essentially the same performance for both
Python and Java, so the Python slowness affects only large fetches.
All my code is available here, in case I've missed any details:
http://github.com/greensnark/appenginedatastoretest
I also instrumented the sample apps with appstats (as suggested on
stackoverflow), and reran the tests (1k record fetch). Appstats
reports times like this "datastore_v3.RunQuery real=122ms api=9179ms"
for Java and times like "datastore_v3.RunQuery real=377ms api=9179ms"
for Python. I'm not entirely clear on how to read the appstats times.
>From my examination of the Python code in
google.appengine.api.datastore, it looks like most of the extra
slowdown in the Python code involves decoding the queried entities
from their protocol buffers, but I haven't benchmarked this to be
sure.
Could anyone confirm if large datastore queries are just slower in
Python because Python is intrinsically slower than Java, or that my
code is broken in some way that's screwing with the performance in the
Python version?
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.