Davide Alberani wrote: > I think that now our first priority is to settle the functionalities > of the parse_dom() method and the scheme of the "extractors" attribute. >
Yes, I agree. > Unfortunately I think I'm still not confident enough with my abilities > with DOM/XPath, so I'm a bit confused. > XPath is not a big deal. I'm no expert myself but I think a few simple constructs like recursive/nonrecursive search and attribute matching will be sufficient most of the time. > Keeping an eye at the features we need, extractors can be used in too > many different ways: "attribute.key" behave in one way if it's None and > in another way if it's a string (my fault), path can be a list or a > dictionary, "section" can interfer with "attribute.key" and so on... > > Obviously I'm not saying these are bad things: these features are > _fundamentals_ and must stay. Maybe they are not bad things but piling them up as "quick hacks" as we -partially- did until now might damage the longer-term goal of simpler maintenance of the parsers. > What we need is a cleaner usage schema: it should be more clear (or > at least documented) that if you want to just extract the text > from a list of <li> tags inside an <ol> tag, you must write an > "extractors" in a given way. Yes. We need documentation and guidelines about how to write the parsers. We also need to sort out the conceptual issues about extractors, attributes, keys, postprocessors etc. > On the other side, if you have a complex data structure inside a > <div> tag, "extractors" must be written in another way. > Generally speaking, it should also be clearer the type of returned > items: a list, a string, another dictionary... > > Maybe the code is already good enough and I still have just to > grasp it. :-) > The current state of the code is good but it could get messy. I'm absolutely sure you have no problem grasping it :-) I have seen some of the parsers and I have some idea about how things work but you can surely see a much much better overall picture. Besides, the design style of these modules should match the other parts of imdbpy, so I'm suggesting that you set the guidelines. I'm not trying to escape work, honest :-) Is there anything you want me to do to start? Turgut ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Imdbpy-devel mailing list Imdbpy-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-devel