We use Cloud Storage to store large elasticsearch results (from aggregations - so scan+scroll isn't going to work here).
To handle these large aggregations in parallel, we store them as multiline JSON dumps that is sourced from a managed vm. As a result, to perform *parallel processing*, many *app engine *instances will open this file at once, and as a result, *hit the URLFetch rate limit* because of this documented limitation: and the calls count against your URL fetch quota, as the library uses the > URL Fetch service to interact with Cloud Storage. - https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/ *Here's the resulting exception:* <https://lh3.googleusercontent.com/-WbU1UiwCB2s/VbkWhCRDMjI/AAAAAAAAAH0/Ta3WBGEC0n0/s1600/Screenshot%2B2015-07-28%2B17.07.40.png> *Here's the code that opens the file:* import cloudstorage as gcs def open_file(path, mode, **kwargs): f = gcs.open(path, mode=mode, **kwargs) if not f: raise Exception("File could not be opened: %s" % path) return f -- We need a method of communicating with Cloud Storage that bypasses the URLFetch quotas and rate limits, or it becomes impossible for us to effectively execute parallel processing. *Is there a method of reading GCS files from App Engine that does not route through URLFetch*, much like the datastore API does not incur url fetch rate limits? I've detailed this question on Stackoverflow as well: http://stackoverflow.com/questions/31707961/urlfetch-rate-limits-with-google-cloud-storage -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/ffbec1ff-ed10-490d-b908-797bc6364398%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
