On Thu, Oct 30, 2014 at 09:02:28PM +0100, Leszek Tarkowski wrote: > My 0.02$ > > There are some examples of using lxml in my repository: > > https://github.com/czterybity/PyBasicTraining > > but AFAIR lxml can be tricky to install in some configurations (especially > on windows). +1 for BeautifulSoup - easy and elegant, but still not in > included batteries. Using standard library to get data from xml is quite > easy (and efficient) if you use xml.sax instead of ElementTree or DOM.
I'm a bit surprised at the suggestion to prefer SAX over DOM -- my preference would be for DOM, because it provides easier access to the context of an element (such as e.g. finding <li> elements that are children of <ol> elements but not those that are in <ul> elements), and because it's much easier to write any processed result back in XML (using the writexml method). SAX, on the other hand, requires writing a subclass that consumes events corresponding to elements of interest, which I'd expect some students may find rather new / unexpected / challenging. Furthermore, SAX can also lead people (including myself) to write parsers that process certain elements with no regard of their context. Reviewing Aleksandra's original post I think the objective to "do some rather simple manipulation with the outputs" prompted me to suggest DOM in my earlier response, because I assumed that the outputs were to be XML themselves. Best regards, Jan P.S.: Sorry for the duplicate posting of my previous response, this was due to me subscribing with my "gmail" rather than the apparently canonical "googlemail" address; I fixed that now. > Good luck, > Leszek > > --- > dr in??. Leszek Tarkowski > www.infotraining.pl > > > On 30 October 2014 20:37, Andrew Walker [EAR] <[email protected]> wrote: > > > Hi Aleksandra, > > > > No problem. I've dug the old XPath exercises out from a dusty disk and > > made them accessible here: > > http://homepages.see.leeds.ac.uk/~earawa/FoX/iFaX/iFaX.3/ with the > > introductory slide deck here: > > http://homepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/Practical3_intro.pdf > > > > I forgot that by this point we had talked about XML namespaces... which > > complicate things. > > > > Best wishes, > > > > Andrew > > > > ________________________________________ > > From: Discuss [[email protected]] On Behalf Of > > Aleksandra Pawlik [[email protected]] > > Sent: 30 October 2014 14:17 > > Cc: [email protected] > > Subject: Re: [Discuss] Teaching very simple XML manipulation > > > > Dear All, > > > > What can I say...You are AMAZING :-) Thanks a lot for all emails and > > info. Very very useful already. > > Andrew, if it doesn't take too much of your time, can you point me to > > the old XPath exercises you mentioned? > > > > Yours truly grateful, > > Aleksandra > > > > On 30 October 2014 14:13, Andrew Walker [EAR] <[email protected]> > > wrote: > > > Hi Aleksandra, > > > > > > A good few years ago I helped to put together and run a two day course > > on "XML for Fortran programmers" (best not to ask why). Most of the hands > > on for this was in Fortran; the first two exercises looked at how to create > > a well-formed document the last two how to read one using DOM and SAX based > > parsers. In between these we stuck an exercise on reading an XML document > > using Python, mainly to show the simplicity of XPath based parsers > > (compared with what was to come) but also to drive home the point that you > > can view a well-formed XML document as a tree of nodes. > > > > > > I think the two important questions are: How much do you want your > > students to learn about XML as a technology (as opposed to just being able > > to parse a document)? Do you expect them to always deal with well-formed > > XML, or are they likely to need to handle a wider range of XML-like > > documents later in their course? > > > > > > If the lesson should be applicable to a wider range of documents I think > > BeautifulSoup is probably the way to go. If the idea is to learn about the > > details of XML I would probably start with an exercise using XPath and try > > to focus on the subset that is supported by ElementTree (leaving the choice > > of ElementTree and lxml as a detail for now). > > > > > > I can probably find the documentation for the old XPath exercises if > > they will be useful. > > > > > > Cheers, > > > > > > Andrew > > > > > > > > > > > > > > > -- > > > Dr Andrew Walker > > > NERC Independent Research Fellow > > > School of Earth and Environment, University of Leeds > > > http://www.see.leeds.ac.uk/people/a.walker > > > > > > From: "Neil Chue Hong (SSI)" <[email protected]<mailto: > > [email protected]>> > > > Date: Thursday, 30 October 2014 12:29 > > > To: Aleksandra Pawlik <[email protected]<mailto: > > [email protected]>> > > > Cc: "[email protected]<mailto: > > [email protected]>" < > > [email protected]<mailto: > > [email protected]>> > > > Subject: Re: [Discuss] Teaching very simple XML manipulation > > > > > > Hi Aleksandra, > > > > > > what sort of manipulation are you going to ask your students to do? Is > > it just finding elements and then doing something, or is it something more > > complex. > > > > > > ElementTree vs lxml is the argument you'll get into for which Python XML > > library you're going to want to use. I can't comment on this. > > > > > > Strangely, I recently did a little bit of XML parsing and used > > BeautifulSoup (normally used for web pages) as I'm more used to it, and it > > does work (and can use ElementTree or lxml as its basic parser). > > > > > > Neil > > > > > > On 30 October 2014 11:51, Aleksandra Pawlik < > > [email protected]<mailto:[email protected]>> > > wrote: > > > Hello! > > > > > > We are running an SWC course at Manchester as a part of a one week > > > training for MSc students in Clinical Bioinformatics. We have 2.5 days > > > for SWC and then the students will work on a small programming task in > > > teams. The idea is that they will need to grab some XML files, parse > > > them and then do some rather simple manipulation with the outputs. > > > > > > At the end of the SWC (we'll be using Python) we want to show them how > > > to use a _simple_ library for XML. So before I dive into Google, I > > > though I'd be lazy and ask the SWC community: > > > 1) Has anyone created an SWC module on XML? If yes, can you point me to > > it? > > > 2) Which Python library from XML would you recommend? > > > 3) Do you have any other suggestions? > > > > > > Before you jump on me saying "What the heck are you doing?Software > > > Carpentry doesn't teach XML." I'll just say that the goal is _not_ to > > > focus on XML. We want to show them how to use Python libraries. XML is > > > an example and in the case of this particular audience, it is a better > > > example than NumPy and SciPy (we had lots of discussions with prof. > > > Andy Brass who runs the whole course). We will deliver the standard > > > SWC but the remaining 2 days they are supposed to try flying on their > > > own, working in groups writing a small program using what we taught > > > them (structured programming, version control etc.) > > > > > > So, halp? Anyone? > > > > > > Many thanks. > > > Aleksandra > > > > > > _______________________________________________ > > > Discuss mailing list > > > [email protected]<mailto: > > [email protected]> > > > > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org > > > > > > > > > > > > > > > -- > > > Neil Chue Hong > > > Director, Software Sustainability Institute > > > EPCC, University of Edinburgh, JCMB, Edinburgh, EH9 3FD, UK > > > Tel: +44 (0)131 650 5957 > > > http://www.software.ac.uk/ > > > > > > LinkedIn: http://uk.linkedin.com/in/neilchuehong > > > Twitter: http://twitter.com/npch > > > ORCID: http://orcid.org/0000-0002-8876-7606 > > > > _______________________________________________ > > Discuss mailing list > > [email protected] > > > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org > > _______________________________________________ > > Discuss mailing list > > [email protected] > > > > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org > > > _______________________________________________ > Discuss mailing list > [email protected] > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org -- +- Jan T. Kim -------------------------------------------------------+ | email: [email protected] | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----* _______________________________________________ Discuss mailing list [email protected] http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
