Thanks for the feedback.

The Atom file is typically small ~100KB.
I am using both minidom (for extracting entities) and BeautifulSoup
(to extract properties from entries).

Which XML parser do you recommend? I'm a just starting with Python.
What is the fastest setup to fetch and parse Atom feeds (simple field
extraction)?

Finally, what is the best way to measure how long is my code taking on
each task?

Thanks again,
--
Sérgio Nunes

On Feb 25, 12:51 pm, "Scott Seely" <[email protected]> wrote:
> Are you doing anything other than parsing the XML? Also, how big is the Atom
> feed? > 10MB?
>
> Also, are you using a DOM parser or stream based? Stream based will be a bit
> faster.
>
> Finally, how long does it take you to parse, say, 100 Atom entries using
> your code? I would expect the time to be on the order of <1ms.
>
> > -----Original Message-----
> > From: [email protected] [mailto:google-
> > [email protected]] On Behalf Of Sérgio Nunes
> > Sent: Tuesday, February 24, 2009 6:04 AM
> > To: Google App Engine
> > Subject: [google-appengine] Advice on dealing with high CPU consumption
> > in fetch + parse script
>
> > Hi,
>
> > I would like to have some advice on how to deal with a CPU consuming
> > script.
> > The script simply fetches an Atom XML file (using urlfetch) and then
> > parses each item using both minidom and BeautifulSoup. The Atom file
> > typically has 50 entries.
>
> > It seems that spawning a process for each N entries to be parsed would
> > be the best option. However I think that this is not possible with
> > GAE.
>
> > The Atom file is being retrieved every hour. I could reduce the number
> > of entries to be parsed by increasing the frequency of urlfetch calls.
> > The trade off seems to be between more calls to urlfetch with fewer
> > items to parse, or less calls to urlfetch with more items to parse.
>
> > Any other option I am missing?
> > In a nutshell, what is the best (optimized and scalable) way to
> > periodically fetch and parse an Atom feed.
>
> > Thanks in advance for any comments,
> > --
> > Sérgio Nunes
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to