On 07/09/2008 09:48 AM, Davide Alberani wrote:
> Trying to write a replacement for movieParser.HTMLRatingsParser
> I started with the below code (incomplete and the data is not in
> what will be the required format); beside the fact that bsoup raise
> an IndexError exception, the problem here is the current parse_dom
> method is unable to map the structure of the expected output,
> which is something like:
> {'arithmetic mean': 8.5, 'rating': 8.5, 'votes': 274730, 'median': 9, 'number 
> of votes': {1: 7128, 2: 1958, 3: 2456, 4: 3007, 5: 5052, 6: 9531, 7: 21999, 
> 8: 45371, 9: 69489, 10: 108739}, 'demographic': {u'aged 45+': (11724, 7.7), 
> u'imdb staff': (36, 8.8), u'aged 30-44': (60029, 8.5), u'females': (28663, 
> 8.3), u'females aged 30-44': (7489, 8.3), 'all votes': (274730, 8.5), 
> u'females aged 45+': (2020, 7.4), u'males': (189097, 8.6), u'males aged 
> 18-29': (120639, 8.8), u'males under 18': (5191, 8.9), u'aged 18-29': 
> (139003, 8.8), u'males aged 30-44': (51737, 8.5), u'non-us users': (135562, 
> 8.6), u'females aged 18-29': (17468, 8.4), u'us users': (81423, 8.5), 
> u'females under 18': (1192, 7.5), u'aged under 18': (6392, 8.8), u'top 1000 
> voters': (786, 7.4), u'males aged 45+': (9541, 7.7)}, 'top 25
>  0 rank': 32}
> 

I doubt if it's realistic that parse_dom can handle such a complicated 
data structure conversion and be generic at the same time.

> 1st stage: tell the XPaths the data to be fetched, maybe with minor
> features like "return None if empty, and store this data in an
> intermediate generic format (a dictionary with lists of strings
> as its values?)

I agree that XPath extraction should basically focus on raw data and 
should not go much further.

> 2nd stage: write rules to transform the data from this intermediate
> format to the one we need.
> 
> The problem is: how to express that?
> 

XSLT? :-) I think we can start by separating these two stages in the 
current code and implementing them as code in order to explore the 
territory.

Turgut


-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Reply via email to