On 07/09/2008 09:48 AM, Davide Alberani wrote: > Trying to write a replacement for movieParser.HTMLRatingsParser > I started with the below code (incomplete and the data is not in > what will be the required format); beside the fact that bsoup raise > an IndexError exception, the problem here is the current parse_dom > method is unable to map the structure of the expected output, > which is something like: > {'arithmetic mean': 8.5, 'rating': 8.5, 'votes': 274730, 'median': 9, 'number > of votes': {1: 7128, 2: 1958, 3: 2456, 4: 3007, 5: 5052, 6: 9531, 7: 21999, > 8: 45371, 9: 69489, 10: 108739}, 'demographic': {u'aged 45+': (11724, 7.7), > u'imdb staff': (36, 8.8), u'aged 30-44': (60029, 8.5), u'females': (28663, > 8.3), u'females aged 30-44': (7489, 8.3), 'all votes': (274730, 8.5), > u'females aged 45+': (2020, 7.4), u'males': (189097, 8.6), u'males aged > 18-29': (120639, 8.8), u'males under 18': (5191, 8.9), u'aged 18-29': > (139003, 8.8), u'males aged 30-44': (51737, 8.5), u'non-us users': (135562, > 8.6), u'females aged 18-29': (17468, 8.4), u'us users': (81423, 8.5), > u'females under 18': (1192, 7.5), u'aged under 18': (6392, 8.8), u'top 1000 > voters': (786, 7.4), u'males aged 45+': (9541, 7.7)}, 'top 25 > 0 rank': 32} >
I doubt if it's realistic that parse_dom can handle such a complicated data structure conversion and be generic at the same time. > 1st stage: tell the XPaths the data to be fetched, maybe with minor > features like "return None if empty, and store this data in an > intermediate generic format (a dictionary with lists of strings > as its values?) I agree that XPath extraction should basically focus on raw data and should not go much further. > 2nd stage: write rules to transform the data from this intermediate > format to the one we need. > > The problem is: how to express that? > XSLT? :-) I think we can start by separating these two stages in the current code and implementing them as code in order to explore the territory. Turgut ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel