On 07/10/2008 07:35 PM, Davide Alberani wrote: > Woah! All I have kept, from my glorious XSLT days, are two books > and nightmares! ;-) > No, let's keep the parser as simple as possible. >
I was joking :-) I don't know the first thing about XSLT and I don't have the time or energy to put into learning it. > In the next days I'll be painfully busy, after that I'll write > some more parsers (they help to see if we're missing something). > Another thing I want to do is to use the _ModuleProxy architecture > to let old and new parsers selectable at runtime: it would be very > useful for the transition phase. > Yes, that would be nice. > When you have time, the main thing to do is to write other parser, > and see if parse_dom/extractors can be better defined/structured. > OK, I'll do that. > PS: the new parser raise an exception, if used with bsoup. > I've fixed it and I've also made some improvements. Due to an xpath feature that the current bsoup interpreter does not support, I've had to rewrite the ratings parser a bit. Let me explain: The 'demographic voters' attribute had the xpath ".//td[1]//text()". To handle this, the current implementation recursively finds all 'td' elements and selects the first one and then recursively gets all text nodes under it. But the xpath specification says that this expression should actually mean "recursively get all td elements that are the first td child of their parents and then recursively get all text nodes under them". The xpath you've written is fine but supporting it is a bit difficult because after finding all 'td' elements, it should go up to the parent element and check whether this 'td' is the first 'td' in that element. It can be done but it could seriously slow down the interpreter. So I've changed the xpath to: "tr/td[1]//text()" that escapes the problem. What do you think, should I conform to the xpath specification or delay this until we can't do without this feature? Turgut > > Thanks, ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel