Davide Alberani wrote: > On Jul 02, "H. Turgut Uyar" <[EMAIL PROTECTED]> wrote: > > Seen - very useful for such a generic parser. For other parsers used > for multiple pages (like persons/characters), maybe we can write > two separated parser: after all the DOM approach spares so many lines > of code... :-) >
Generally separate parsers is better, as in the search movie, person etc. parsers. But in this case (official sites, external reviews, ...) all of these parsers would have the exact same extractors and attributes which would result in repeated code. > I've committed support for names/titles references (mostly untested). Seen it. I still have to figure out how references are used. Can you tell me where to find an example? > As you can see from the GatherRefs class I still have some problems > with DOM/XPath: I'm almost sure there is a cleaner way to obtain the > same result. > I think the ones you've written are fine. One suggestion though: I think we should always write path expression strings in double quotes, because single quotes can be part of the expression itself. > Speaking of that: I was thinking at a parser for the movie's quotes > page, and I had some real trouble: the data is not in a <ul> list, > but just separated by <hr> and I can't find an easy way to express - with > XPath - the portion of document I need. Can you write me an example, > for a parser for: http://akas.imdb.com/title/tt0133093/quotes ? > Yes, this one's tricky. Playing with XPather I see that the following path gives me all the 'b' elements that contain character names for quotes (could be the extractor): //b/a[starts-with(@href, '/name/nm')]/.. After that the attributes could be: character link: a/@href character name: a/text() quote: following-sibling::text() section: preceding-sibling::a[1]/@name The section specification would group quotes using the name attribute of the preceding 'a' element. Two problems here: - Still have to handle the notes in italic. - My bsoup interpretor does not support preceding-sibling yet but it should be easy to add. Turgut > Thanks! > ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel