We use Cloud Storage to store large elasticsearch results (from 
aggregations - so scan+scroll isn't going to work here).

To handle these large aggregations in parallel, we store them as multiline 
JSON dumps that is sourced from a managed vm.

As a result, to perform *parallel processing*, many *app engine *instances 
will open this file at once, and as a result, *hit the URLFetch rate limit* 
because 
of this documented limitation:

and the calls count against your URL fetch quota, as the library uses the 
> URL Fetch service to interact with Cloud Storage.


- https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/


*Here's the resulting exception:*

<https://lh3.googleusercontent.com/-WbU1UiwCB2s/VbkWhCRDMjI/AAAAAAAAAH0/Ta3WBGEC0n0/s1600/Screenshot%2B2015-07-28%2B17.07.40.png>


*Here's the code that opens the file:*

    import cloudstorage as gcs

    def open_file(path, mode, **kwargs):
        f = gcs.open(path, mode=mode, **kwargs)
        if not f:
            raise Exception("File could not be opened: %s" % path)

        return f

--

We need a method of communicating with Cloud Storage that bypasses the 
URLFetch quotas and rate limits, or it becomes impossible for us to 
effectively execute parallel processing.

*Is there a method of reading GCS files from App Engine that does not route 
through URLFetch*, much like the datastore API does not incur url fetch 
rate limits?




I've detailed this question on Stackoverflow as well:
http://stackoverflow.com/questions/31707961/urlfetch-rate-limits-with-google-cloud-storage

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/ffbec1ff-ed10-490d-b908-797bc6364398%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to