Hey Josh, It seems as though you got some pretty good answers in the stackoverflow thread. I'll add on my thoughts:
- You can make a feature request in the public issue tracker <https://code.google.com/p/googleappengine/issues/list> with an explanation of your use-case if you'd like to see something implemented - You can also look into the use of Datastore <https://cloud.google.com/datastore/docs/concepts/overview?hl=en> to store the temporary results of your process, since this will have better rate-limiting quotas than cloud storage, which isn't really meant for rapid writes such as this. You could also look into BigTable <https://cloud.google.com/bigtable/docs/>, or any number of distributed databases such as memcached <http://memcached.org/> to resolve your issue of temporary file storage. I hope this has helped you. Feel free to ask any questions you may have, or to go ahead and create a feature request / quota increase request in the public issue tracker. Best wishes, Nick On Wednesday, July 29, 2015 at 2:09:04 PM UTC-4, Josh Whelchel (Loudr) wrote: > > We use Cloud Storage to store large elasticsearch results (from > aggregations - so scan+scroll isn't going to work here). > > To handle these large aggregations in parallel, we store them as multiline > JSON dumps that is sourced from a managed vm. > > As a result, to perform *parallel processing*, many *app engine *instances > will open this file at once, and as a result, *hit the URLFetch rate > limit* because of this documented limitation: > > and the calls count against your URL fetch quota, as the library uses the >> URL Fetch service to interact with Cloud Storage. > > > - https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/ > > > *Here's the resulting exception:* > > > <https://lh3.googleusercontent.com/-WbU1UiwCB2s/VbkWhCRDMjI/AAAAAAAAAH0/Ta3WBGEC0n0/s1600/Screenshot%2B2015-07-28%2B17.07.40.png> > > > *Here's the code that opens the file:* > > import cloudstorage as gcs > > def open_file(path, mode, **kwargs): > f = gcs.open(path, mode=mode, **kwargs) > if not f: > raise Exception("File could not be opened: %s" % path) > > return f > > -- > > We need a method of communicating with Cloud Storage that bypasses the > URLFetch quotas and rate limits, or it becomes impossible for us to > effectively execute parallel processing. > > *Is there a method of reading GCS files from App Engine that does not > route through URLFetch*, much like the datastore API does not incur url > fetch rate limits? > > > > > I've detailed this question on Stackoverflow as well: > > http://stackoverflow.com/questions/31707961/urlfetch-rate-limits-with-google-cloud-storage > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/892b6118-b361-4aa0-a4f6-5297c92d46a3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
