Hello again!
While hunting a bug down today I found some code that was slowing down
gPodder's loading time :D
The original bug was that in the episode descriptions I was still seeing stuff
like ’. This is because the python codepoint2name dict doesn't include
all of the possible unicode characters. So I replaced the old code with a
regex that converts the codepoint numbers directly to unicode characters. In
a quick benchmark I calculated that using the old code took 3.28 sec worth of
load time whereas the new code uses < 0.1 sec of load time :)
Here are some examples of feeds which include those weird codepoints:
- http://feeds.feedburner.com/doctorow_podcast
- http://feeds.feedburner.com/nlo
Now what's really cool is that when I launch gPodder, it's ready to go in less
than 2 seconds! (on an intel E6300)
Let me know what you guys think,
nick
--- gpodder-r615/src/gpodder/util.py 2008-03-19 23:15:38.000000000 -0400
+++ gpodder-r615-dev/src/gpodder/util.py 2008-03-19 23:28:32.000000000 -0400
@@ -309,14 +309,13 @@
# strips html from a string (fix for <description> tags containing html)
rexp = re.compile( "<[^>]*>")
stripstr = rexp.sub( '', html)
- # replaces numeric entities with entity names
- dict = htmlentitydefs.codepoint2name
- for key in dict.keys():
- stripstr = stripstr.replace( '&#'+str(key)+';', '&'+unicode( dict[key], 'iso-8859-1')+';')
+ # replace unicode entities with the characters they represent
+ unicode_ent_re = re.compile( '&#(\d{2,4});' )
+ stripstr = unicode_ent_re.sub( lambda x: unichr(int(x.group(1))), stripstr )
# strips html entities
dict = htmlentitydefs.entitydefs
- for key in dict.keys():
- stripstr = stripstr.replace( '&'+unicode(key,'iso-8859-1')+';', unicode(dict[key], 'iso-8859-1'))
+ html_ent_re = re.compile( '&(.{2,8});' )
+ stripstr = html_ent_re.sub( lambda x: unicode(dict.get(x.group(1),''), 'iso-8859-1'), stripstr )
return stripstr
_______________________________________________
gpodder-devel mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/gpodder-devel