Re: [google-appengine] Urlfetch CSV data and storing in Datastore entity is consuming too much CPU and Deadline Exceed Errors

Nick Johnson (Google) Thu, 07 Apr 2011 17:27:05 -0700

Hi Sarfaraz,

If you need to perserve existing properties, you should write your own
transaction function, and do all the work in there. get_or_insert is simply
a convenience function for this:


entity = cls.get(key_name)
if not entity:
  entity = cls(key_name, **kwargs)
  entity.put()
return entity

-Nick Johnson

On Fri, Apr 8, 2011 at 9:43 AM, Sarfaraz <[email protected]>wrote:

> Thank you so much for the help Nick !
>
> I still got a problem, I am not over writing all the properties there are
> some properties which are not changed by the CSV.
> If I follow the sample you gave me:
>
> entities = []
> for row in rows:
>   q = Quotes(key_name=cells[0])
>   # Fill in q, or pass the values as arguments to the constructor above
>   entities.append(q)
> db.put(entities)
>
> The Other Properties are becoming None.
>
> If I use
> q = Quotes.get_by_key_name (key_name=cells[0])
>
> Instead of
>
> q = Quotes(key_name=cells[0])
>
> The other properties are not affected only those i wish to update from csv
> are updated. this works for me but when i  run the cron every 1 minute its
> still utilizing 2% of cpu / hour.
>
> Can I improve on this ?
>
> Regards
>
>
>
>
>
>
>
>
> Warm Regards
> Sarfaraz Farooqui
> --
> Strong and bitter words indicate a weak cause
>
>
>
>
> On Thu, Apr 7, 2011 at 7:33 AM, Nick Johnson (Google) <
> [email protected]> wrote:
>
>> Hi Sarfaraz,
>>
>> Which sort of deadline exceeded error are you getting? If it's from the
>> URLFetch call, you can increase the deadline; on offline requests (task
>> queue and cron job) that can be up to 10 minutes.
>>
>> You can generally ignore the 'too much CPU' warning; there are no longer
>> per-handler or per-page quotas, and offline requests may use as much CPU as
>> they need (providing you have the quota for it!). You can improve the
>> efficiency of your code, though: Instead of calling .put() in each iteration
>> of the loop, batch the results up and store them with a single put call:
>>
>> entities = []
>> for row in rows:
>>   q = Quotes(key_name=cells[0])
>>   # Fill in q, or pass the values as arguments to the constructor above
>>   entities.append(q)
>> db.put(entities)
>>
>> The other modification, above, is to simply call the constructor instead
>> of get_or_insert. get_or_insert executes a transaction to look for an
>> existing entity, and insert one if it doesn't exist. You're simply
>> overwriting all the values of the entity, though, so there's no need to
>> fetch the old one or do it transactionally - you can simply create a new
>> entity, overwriting any old data.
>>
>> -Nick Johnson
>>
>>
>> On Thu, Apr 7, 2011 at 7:00 AM, Sarfaraz <[email protected]>wrote:
>>
>>> Hi,
>>> I am running a cron every 5 minutes to get csv data (stock prices)  from
>>> a url using urllib2. After splitting the data by Newline and Comma I am
>>> storing the data in datastore
>>>
>>> I am getting many deadlineexceed errors and warning messages that the
>>> cron "uri uses a high amount of cpu and may soon exceed its quota"
>>>
>>> I am attaching a few snapshots from my dashboard, and also pasting my
>>> code below for a review: is this Normal or I am doing somethign wrong.
>>>
>>>
>>> *Model*
>>>
>>>
>>> class Quotes(db.Model):            #Symbol is stored as the key_name
>>>  PriceDate = db.StringProperty()   # I am using string here because i
>>> just show this value no calculations or sorting on this ignore it.
>>>  Open = db.FloatProperty()
>>>  High = db.FloatProperty()
>>>  Low = db.FloatProperty()
>>>  Last = db.FloatProperty()
>>>  Change = db.FloatProperty()
>>>  PerChange = db.FloatProperty()
>>>  PrevClose = db.FloatProperty()
>>>  Vol = db.IntegerProperty()
>>>  Val = db.IntegerProperty()
>>>  UpdatedOn = db.DateTimeProperty(auto_now=True)
>>>
>>> **
>>> *Code for URL fetch, parsing and storing into datastore:*
>>>
>>> I get the csv data first which consists of 147 Rows and 9 Columns (comma
>>> seperated values) rows may increase in future
>>> First I split by New Line Charachter to get all Rows which are approx 147
>>> Then I loop through each row and split it by comma to get column values
>>> (approx 9 columns)
>>> while looping for the rows I fetch an entity by using  " get_or_insert (
>>> key_name )
>>> then I update all the properties  and call the put().
>>>
>>> Please check the detailed code below ( I have removed error handling and
>>> logging stuff for brevity)
>>>
>>> These are a few lines from the csv data I recieve from the url.
>>> ***********************************************************************
>>> 1010,4/6/2011 3:32:01 PM,26.00,26.20,26.00,26.20,0.10,207359,5423720
>>> 1020,4/6/2011 3:32:01 PM,19.25,19.60,19.10,19.60,0.35,739595,14399067
>>> 1030,4/6/2011 3:32:01 PM,20.20,20.30,20.15,20.30,0.10,31833,643936
>>> 1040,4/6/2011 3:32:01 PM,30.40,30.80,30.40,30.80,0.00,10621,325830
>>> 1050,4/6/2011 3:32:01 PM,49.30,50.50,49.00,50.00,1.20,126361,6326252
>>>
>>> ***********************************************************************
>>>
>>> HERE IS THE CODE
>>>
>>> url = "http://www.example.com/somefile.ashx";
>>>
>>> result = urllib2.urlopen(url)
>>> result = result.read()
>>> rows = result.split("\n")
>>> for row in rows:
>>>    cells = row.split(",")
>>>    q = Quotes.get_or_insert( (str(cells[0])).strip() )   ## cells[0]
>>> contains my key_name
>>>    q.PriceDate = cells[1]
>>>    q.Open = float(cells[2])   #opening price
>>>    q.High = float(cells[3])   #days high price
>>>    q.Low = float(cells[4])    #days low price
>>>    q.Last = float(cells[5])
>>>    q.Change = float(cells[6])
>>>    q.PrevClose = float(q.Last - q.Change)
>>>    q.PerChange =  round( (q.Change * 100 ) / q.PrevClose , 2)
>>>    q.Vol = int(cells[7])
>>>    q.Val = int(cells[8])
>>>    q.put()
>>> **
>>> **
>>> *This is using too much cpu and exceeding deadline because I am fetching
>>> 1 entity at a time and using put() for each entity thats 147 times ?  AND
>>> also because I am doing some calculations such as q.PrevClose =
>>> (q.Last-q.Change) instead of calculating it in a diffrent variable and then
>>> assigning it to property. are these the reason ? please help me in getting
>>> this right*
>>> **
>>> *Attached: Dashbaord Screnshots*
>>> **
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>
>>
>> --
>> Nick Johnson, Developer Programs Engineer, App Engine
>>
>>
>>
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Urlfetch CSV data and storing in Datastore entity is consuming too much CPU and Deadline Exceed Errors

Reply via email to