Are you doing anything other than parsing the XML? Also, how big is the Atom
feed? > 10MB?

Also, are you using a DOM parser or stream based? Stream based will be a bit
faster. 

Finally, how long does it take you to parse, say, 100 Atom entries using
your code? I would expect the time to be on the order of <1ms.

> -----Original Message-----
> From: [email protected] [mailto:google-
> [email protected]] On Behalf Of Sérgio Nunes
> Sent: Tuesday, February 24, 2009 6:04 AM
> To: Google App Engine
> Subject: [google-appengine] Advice on dealing with high CPU consumption
> in fetch + parse script
> 
> 
> Hi,
> 
> I would like to have some advice on how to deal with a CPU consuming
> script.
> The script simply fetches an Atom XML file (using urlfetch) and then
> parses each item using both minidom and BeautifulSoup. The Atom file
> typically has 50 entries.
> 
> It seems that spawning a process for each N entries to be parsed would
> be the best option. However I think that this is not possible with
> GAE.
> 
> The Atom file is being retrieved every hour. I could reduce the number
> of entries to be parsed by increasing the frequency of urlfetch calls.
> The trade off seems to be between more calls to urlfetch with fewer
> items to parse, or less calls to urlfetch with more items to parse.
> 
> Any other option I am missing?
> In a nutshell, what is the best (optimized and scalable) way to
> periodically fetch and parse an Atom feed.
> 
> Thanks in advance for any comments,
> --
> Sérgio Nunes
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to