On 07/09/2008 09:48 AM, Davide Alberani wrote:
> Trying to write a replacement for movieParser.HTMLRatingsParser
> I started with the below code (incomplete and the data is not in
> what will be the required format); beside the fact that bsoup raise
> an IndexError exception, the problem here is the current parse_dom
> method is unable to map the structure of the expected output,
> which is something like:
> {'arithmetic mean': 8.5, 'rating': 8.5, 'votes': 274730, 'median': 9, 'number
> of votes': {1: 7128, 2: 1958, 3: 2456, 4: 3007, 5: 5052, 6: 9531, 7: 21999,
> 8: 45371, 9: 69489, 10: 108739}, 'demographic': {u'aged 45+': (11724, 7.7),
> u'imdb staff': (36, 8.8), u'aged 30-44': (60029, 8.5), u'females': (28663,
> 8.3), u'females aged 30-44': (7489, 8.3), 'all votes': (274730, 8.5),
> u'females aged 45+': (2020, 7.4), u'males': (189097, 8.6), u'males aged
> 18-29': (120639, 8.8), u'males under 18': (5191, 8.9), u'aged 18-29':
> (139003, 8.8), u'males aged 30-44': (51737, 8.5), u'non-us users': (135562,
> 8.6), u'females aged 18-29': (17468, 8.4), u'us users': (81423, 8.5),
> u'females under 18': (1192, 7.5), u'aged under 18': (6392, 8.8), u'top 1000
> voters': (786, 7.4), u'males aged 45+': (9541, 7.7)}, 'top 25
> 0 rank': 32}
>
I doubt if it's realistic that parse_dom can handle such a complicated
data structure conversion and be generic at the same time.
> 1st stage: tell the XPaths the data to be fetched, maybe with minor
> features like "return None if empty, and store this data in an
> intermediate generic format (a dictionary with lists of strings
> as its values?)
I agree that XPath extraction should basically focus on raw data and
should not go much further.
> 2nd stage: write rules to transform the data from this intermediate
> format to the one we need.
>
> The problem is: how to express that?
>
XSLT? :-) I think we can start by separating these two stages in the
current code and implementing them as code in order to explore the
territory.
Turgut
-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Imdbpy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel