On 07/10/2008 07:35 PM, Davide Alberani wrote:
> Woah!  All I have kept, from my glorious XSLT days, are two books
> and nightmares! ;-)
> No, let's keep the parser as simple as possible.
> 

I was joking :-) I don't know the first thing about XSLT and I don't 
have the time or energy to put into learning it.

> In the next days I'll be painfully busy, after that I'll write
> some more parsers (they help to see if we're missing something).
> Another thing I want to do is to use the _ModuleProxy architecture
> to let old and new parsers selectable at runtime: it would be very
> useful for the transition phase.
> 

Yes, that would be nice.

> When you have time, the main thing to do is to write other parser,
> and see if parse_dom/extractors can be better defined/structured.
> 

OK, I'll do that.

> PS: the new parser raise an exception, if used with bsoup.
> 

I've fixed it and I've also made some improvements. Due to an xpath 
feature that the current bsoup interpreter does not support, I've had to 
rewrite the ratings parser a bit. Let me explain:

The 'demographic voters' attribute had the xpath ".//td[1]//text()". To 
handle this, the current implementation recursively finds all 'td' 
elements and selects the first one and then recursively gets all text 
nodes under it. But the xpath specification says that this expression 
should actually mean "recursively get all td elements that are the first 
td child of their parents and then recursively get all text nodes under 
them". The xpath you've written is fine but supporting it is a bit 
difficult because after finding all 'td' elements, it should go up to 
the parent element and check whether this 'td' is the first 'td' in that 
element. It can be done but it could seriously slow down the 
interpreter. So I've changed the xpath to: "tr/td[1]//text()" that 
escapes the problem.

What do you think, should I conform to the xpath specification or delay 
this until we can't do without this feature?

Turgut

> 
> Thanks,


-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to