Good to hear that you've found a solution.  Generally speaking, bulk tasks 
are often better broken down into smaller tasks as this tends to lend 
itself quite well to horizontal scaling.  Happy coding!

On Tuesday, December 13, 2016 at 12:51:17 PM UTC-5, Mike Lucente wrote:
>
> I've changed the app to read from a flat file instead of a DBF and am now 
> processing 100 insert/updates at a time vs MySQL. All good now. 
>
> On Mon, Dec 12, 2016 at 2:30 PM, 'Nicholas (Google Cloud Support)' via 
> Google App Engine <[email protected]> wrote:
>
>> Every call to FetchData is issuing a small network request to the 
>> blobstore.  This will most certainly take time when scaling to 24,000 
>> fetches.  The reason this is not encountered when testing in the 
>> development environment (dev_appserver.py) is that the dev blobstore there 
>> is hosted locally on you rmachine.  As such, the fetch takes as much time 
>> as a local file read, not as long as a proper network request.  This 
>> latency is in this way definitely reasonable and to be expected.
>>
>> To solve this issue, you would need to either reduce the latency of these 
>> network calls or reduce the volume of them per request to this module. 
>>  56ms for these Blobstore internal calls is entirely expected so you won't 
>> really be able to cut that down.  Thus, we are left with reducing how many 
>> of these calls are made with each request.
>>
>>    - What is this task queue-issued request doing during it's lifespan 
>>    that affects thousands of records?
>>    - Is it possible for this request to instead be broken up into 
>>    multiple requests affecting fewer records?
>>    - If not, why must all records be processed in a single request?
>>
>> To provide some more practical advice, I would need to know more about 
>> your implementation as with the questions above.
>>
>> On Thursday, December 8, 2016 at 11:12:44 AM UTC-5, Mike Lucente wrote:
>>>
>>> Yes, stackdriver shows that the blobstore file is being accessed for 
>>> every record (/blobstore.FetchData (56 ms) for example). But it works 
>>> fine when run locally(?) I would have expected that the blobstore would 
>>> function just as a regular file would where opening a file and reading 
>>> records is a non-issue. Why is this so painfully slow when accessing 
>>> blobstore? Do I have to slurp the entire file into memory and parse it??
>>>
>>>
>>>
>>>
>>> On Wed, Dec 7, 2016 at 4:57 PM, 'Nicholas (Google Cloud Support)' via 
>>> Google App Engine <[email protected]> wrote:
>>>
>>>> Hey Mike,
>>>>
>>>> I'm not familiar with dbfpy <https://pypi.python.org/pypi/dbfpy/2.3.1> 
>>>> or how it implements iteration but if no other point in your example 
>>>> consumes much time, it seems iterating through *dbf_in* might be the 
>>>> issue.  As it implements __getitem__ 
>>>> <http://dbfpy.bzr.sourceforge.net/bzr/dbfpy/annotate/head%3A/dbfpy/dbf.py#L258>
>>>>  
>>>> to serve as a stream, it's possible that this is what costs cycles by 
>>>> issuing many requests to the blob reader.  I would strongly recommend 
>>>> using Stackdriver 
>>>> Trace <https://cloud.google.com/trace/docs/trace-overview> to see the 
>>>> life of a request and where it spends the bulk of its time.  Let me know 
>>>> what you find.
>>>>
>>>> Nicholas
>>>>
>>>> On Tuesday, December 6, 2016 at 1:45:09 PM UTC-5, Mike Lucente wrote:
>>>>>
>>>>> I'm using dbfpy to read records from a blobstore entry and am unable 
>>>>> to read 24K records before hitting the 10 minute wall (my process is in a 
>>>>> task queue). Here's my code:
>>>>>
>>>>>     def get(self):
>>>>>         count = 0
>>>>>         cols = 
>>>>> ['R_MEM_NAME','R_MEM_ID','R_EXP_DATE','R_STATE','R_RATING1','R_RATING2']
>>>>>
>>>>>         blobkey = self.request.get('blobkey')
>>>>>         blob_reader = blobstore.BlobReader(blobkey)
>>>>>
>>>>>         dbf_in = dbf.Dbf(blob_reader, True)
>>>>>
>>>>>         try:
>>>>>             if dbf_in.fieldNames[0] == 'R_MEM_NAME':
>>>>>                 pass
>>>>>         except:
>>>>>             logging.info("Invalid record type: %s", 
>>>>> dbf_in.fieldNames[0])
>>>>>             return
>>>>>
>>>>>         mysql = mysqlConnect.connect('ratings')
>>>>>         db = mysql.db
>>>>>         cursor = db.cursor()
>>>>>
>>>>>         for rec in dbf_in:
>>>>>             count = count + 1
>>>>>             if count == 1:
>>>>>                 continue
>>>>>
>>>>>             continue
>>>>>
>>>>> ---
>>>>> This simple loop should finish in seconds. Instead it gets through a 
>>>>> few thousand records and then hits the wall.
>>>>>
>>>>> Note the last "continue" that I added to bypass the mysql inserts 
>>>>> (that I previously thought were the culprit).
>>>>>
>>>>> I'm stumped and stuck.
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "Google App Engine" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/google-appengine/L-qePUVWekU/unsubscribe
>>>> .
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/google-appengine.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/google-appengine/ba5b7252-e820-403d-9e60-df3ad1e02cbb%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/google-appengine/ba5b7252-e820-403d-9e60-df3ad1e02cbb%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Google App Engine" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/google-appengine/L-qePUVWekU/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/google-appengine.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/google-appengine/5bd3c0fa-0a14-4938-b40d-e5e2d45daf14%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/google-appengine/5bd3c0fa-0a14-4938-b40d-e5e2d45daf14%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/c736916e-8ffb-4d62-9664-6a29214db06c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to