Re: [google-appengine] Re: Reading DBF records from blobstore object is VERY SLOW

'Nicholas (Google Cloud Support)' via Google App Engine Mon, 12 Dec 2016 11:30:53 -0800

Every call to FetchData is issuing a small network request to the 
blobstore.  This will most certainly take time when scaling to 24,000 
fetches.  The reason this is not encountered when testing in the 
development environment (dev_appserver.py) is that the dev blobstore there 
is hosted locally on you rmachine.  As such, the fetch takes as much time 
as a local file read, not as long as a proper network request.  This 
latency is in this way definitely reasonable and to be expected.


To solve this issue, you would need to either reduce the latency of these 
network calls or reduce the volume of them per request to this module. 
 56ms for these Blobstore internal calls is entirely expected so you won't 
really be able to cut that down.  Thus, we are left with reducing how many 
of these calls are made with each request.

   - What is this task queue-issued request doing during it's lifespan that 
   affects thousands of records?
   - Is it possible for this request to instead be broken up into multiple 
   requests affecting fewer records?
   - If not, why must all records be processed in a single request?

To provide some more practical advice, I would need to know more about your 
implementation as with the questions above.

On Thursday, December 8, 2016 at 11:12:44 AM UTC-5, Mike Lucente wrote:
>
> Yes, stackdriver shows that the blobstore file is being accessed for every 
> record (/blobstore.FetchData (56 ms) for example). But it works fine when 
> run locally(?) I would have expected that the blobstore would function just 
> as a regular file would where opening a file and reading records is a 
> non-issue. Why is this so painfully slow when accessing blobstore? Do I 
> have to slurp the entire file into memory and parse it??
>
>
>
>
> On Wed, Dec 7, 2016 at 4:57 PM, 'Nicholas (Google Cloud Support)' via 
> Google App Engine <[email protected]> wrote:
>
>> Hey Mike,
>>
>> I'm not familiar with dbfpy <https://pypi.python.org/pypi/dbfpy/2.3.1> 
>> or how it implements iteration but if no other point in your example 
>> consumes much time, it seems iterating through *dbf_in* might be the 
>> issue.  As it implements __getitem__ 
>> <http://dbfpy.bzr.sourceforge.net/bzr/dbfpy/annotate/head%3A/dbfpy/dbf.py#L258>
>>  
>> to serve as a stream, it's possible that this is what costs cycles by 
>> issuing many requests to the blob reader.  I would strongly recommend using 
>> Stackdriver 
>> Trace <https://cloud.google.com/trace/docs/trace-overview> to see the 
>> life of a request and where it spends the bulk of its time.  Let me know 
>> what you find.
>>
>> Nicholas
>>
>> On Tuesday, December 6, 2016 at 1:45:09 PM UTC-5, Mike Lucente wrote:
>>>
>>> I'm using dbfpy to read records from a blobstore entry and am unable to 
>>> read 24K records before hitting the 10 minute wall (my process is in a task 
>>> queue). Here's my code:
>>>
>>>     def get(self):
>>>         count = 0
>>>         cols = 
>>> ['R_MEM_NAME','R_MEM_ID','R_EXP_DATE','R_STATE','R_RATING1','R_RATING2']
>>>
>>>         blobkey = self.request.get('blobkey')
>>>         blob_reader = blobstore.BlobReader(blobkey)
>>>
>>>         dbf_in = dbf.Dbf(blob_reader, True)
>>>
>>>         try:
>>>             if dbf_in.fieldNames[0] == 'R_MEM_NAME':
>>>                 pass
>>>         except:
>>>             logging.info("Invalid record type: %s", 
>>> dbf_in.fieldNames[0])
>>>             return
>>>
>>>         mysql = mysqlConnect.connect('ratings')
>>>         db = mysql.db
>>>         cursor = db.cursor()
>>>
>>>         for rec in dbf_in:
>>>             count = count + 1
>>>             if count == 1:
>>>                 continue
>>>
>>>             continue
>>>
>>> ---
>>> This simple loop should finish in seconds. Instead it gets through a few 
>>> thousand records and then hits the wall.
>>>
>>> Note the last "continue" that I added to bypass the mysql inserts (that 
>>> I previously thought were the culprit).
>>>
>>> I'm stumped and stuck.
>>>
>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Google App Engine" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/google-appengine/L-qePUVWekU/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/google-appengine.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/google-appengine/ba5b7252-e820-403d-9e60-df3ad1e02cbb%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/google-appengine/ba5b7252-e820-403d-9e60-df3ad1e02cbb%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/5bd3c0fa-0a14-4938-b40d-e5e2d45daf14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [google-appengine] Re: Reading DBF records from blobstore object is VERY SLOW

Reply via email to