Re: [Imdbpy-devel] First patch for DOM parser

H. Turgut Uyar Tue, 01 Jul 2008 07:33:04 -0700

Davide Alberani wrote:
> I'm committing my changes.  Basically I've moved your _paths structure
> to "extractors", a list/tuple of Extractor instances, which in turn
> contains a list of Attribute instances.


I've noticed a problem here when working with the search movie parser: 
The method that calls the parser expects the result to be a dictionary 
which has a 'data' key that contains a list of tuples where in every 
element the first part is the imdb id of the movie and the second part a 
dictionary which is created by the analyze_title function.

In my old implementation there was only one attribute that extracted 
both the imdb id and the title info and the postprocessor would make a 
tuple out of them. In the new implementation these are two different 
attributes and therefore two separate elements of the resulting list.

We should either use one attribute as before or use the postprocess_data 
method to adapt the result. I've committed a patch which implements the 
first method but the second method would probably be better. Can you 
take a look at it?

Turgut

> The design is very close to your and may be a bit more verbose, but
> in the long term can be more readable - I hope.
> I've slightly modified the parse_dom method adding minor feature (they
> are absolutely untested - some are still unused by the code!)
> 
> I hope you won't find it a complete mess. :-)
> 
> Basically, now, the parse method calls a set of other methods (including
> parse_dom), so that subclasses can modify the output where they need.
> 
> If something is not clear, ask (I wrote the code in a very small time).
> Every name/structure can still be changed: if you have other ideas
> and/or better names for classes and methods, it's time to do these
> changes.
> Many things are not handled, like name/title references (but the
> add_refs method is there).
> 
> I've removed the "result" parameter: it was too prone to side-effects;
> now parse_dom always returns a dictionary; later - other methods -
> can return whatever they want.
> 
> In general, I'm amazed by the amount of code spared by this
> approach.  Just incredible. :-)
> Obviously there are still many things to do: error handling, for
> one (and checking that everything is unicode, and managing things
> like numeric values, and taking care of html/xml references, and so
> on...)
> 
>>> Thank you (and good luck for the match against Germany ;-)
>> Thanks :-) It didn't turn out as we hoped it would but it was an
>> entertaining game after all.
> 
> I've seen it; great match.  After the first half of Italy-Spain,
> I had to put needles under my nails to keep me awake... ;-)
> 
> 


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Imdbpy-devel mailing list
Imdbpy-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Re: [Imdbpy-devel] First patch for DOM parser

Reply via email to