On Thu, Oct 30, 2014 at 09:02:28PM +0100, Leszek Tarkowski wrote:
> My 0.02$
> 
> There are some examples of using lxml in my repository:
> 
> https://github.com/czterybity/PyBasicTraining
> 
> but AFAIR lxml can be tricky to install in some configurations (especially
> on windows). +1 for BeautifulSoup - easy and elegant, but still not in
> included batteries. Using standard library to get data from xml is quite
> easy (and efficient) if you use xml.sax instead of ElementTree or DOM.

I'm a bit surprised at the suggestion to prefer SAX over DOM -- my
preference would be for DOM, because it provides easier access to the
context of an element (such as e.g. finding <li> elements that are
children of <ol> elements but not those that are in <ul> elements),
and because it's much easier to write any processed result back in
XML (using the writexml method).

SAX, on the other hand, requires writing a subclass that consumes events
corresponding to elements of interest, which I'd expect some students may
find rather new / unexpected / challenging. Furthermore, SAX can also
lead people (including myself) to write parsers that process certain
elements with no regard of their context.

Reviewing Aleksandra's original post I think the objective to "do some
rather simple manipulation with the outputs" prompted me to suggest
DOM in my earlier response, because I assumed that the outputs were to
be XML themselves.

Best regards, Jan

P.S.: Sorry for the duplicate posting of my previous response, this was
due to me subscribing with my "gmail" rather than the apparently canonical
"googlemail" address; I fixed that now.




> Good luck,
> Leszek
> 
> ---
> dr in??. Leszek Tarkowski
> www.infotraining.pl
> 
> 
> On 30 October 2014 20:37, Andrew Walker [EAR] <[email protected]> wrote:
> 
> > Hi Aleksandra,
> >
> > No problem. I've dug the old XPath exercises out from a dusty disk and
> > made them accessible here:
> > http://homepages.see.leeds.ac.uk/~earawa/FoX/iFaX/iFaX.3/ with the
> > introductory slide deck here:
> > http://homepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/Practical3_intro.pdf
> >
> > I forgot that by this point we had talked about XML namespaces... which
> > complicate things.
> >
> > Best wishes,
> >
> > Andrew
> >
> > ________________________________________
> > From: Discuss [[email protected]] On Behalf Of
> > Aleksandra Pawlik [[email protected]]
> > Sent: 30 October 2014 14:17
> > Cc: [email protected]
> > Subject: Re: [Discuss] Teaching very simple XML manipulation
> >
> > Dear All,
> >
> > What can I say...You are AMAZING :-) Thanks a lot for all emails and
> > info. Very very useful already.
> > Andrew, if it doesn't take too much of your time, can you point me to
> > the old XPath exercises you mentioned?
> >
> > Yours truly grateful,
> > Aleksandra
> >
> > On 30 October 2014 14:13, Andrew Walker [EAR] <[email protected]>
> > wrote:
> > > Hi Aleksandra,
> > >
> > > A good few years ago I helped to put together and run a two day course
> > on "XML for Fortran programmers" (best not to ask why).  Most of the hands
> > on for this was in Fortran; the first two exercises looked at how to create
> > a well-formed document the last two how to read one using DOM and SAX based
> > parsers. In between these we stuck an exercise on reading an XML document
> > using Python, mainly to show the simplicity of XPath based parsers
> > (compared with what was to come) but also to drive home the point that you
> > can view a well-formed XML document as a tree of nodes.
> > >
> > > I think the two important questions are: How much do you want your
> > students to learn about XML as a technology (as opposed to just being able
> > to parse a document)? Do you expect them to always deal with well-formed
> > XML, or are they likely to need to handle a wider range of XML-like
> > documents later in their course?
> > >
> > > If the lesson should be applicable to a wider range of documents I think
> > BeautifulSoup is probably the way to go. If the idea is to learn about the
> > details of XML I would probably start with an exercise using XPath and try
> > to focus on the subset that is supported by ElementTree (leaving the choice
> > of ElementTree and lxml as a detail for now).
> > >
> > > I can probably find the documentation for the old XPath exercises if
> > they will be useful.
> > >
> > > Cheers,
> > >
> > > Andrew
> > >
> > >
> > >
> > >
> > > --
> > > Dr Andrew Walker
> > > NERC Independent Research Fellow
> > > School of Earth and Environment, University of Leeds
> > > http://www.see.leeds.ac.uk/people/a.walker
> > >
> > > From: "Neil Chue Hong (SSI)" <[email protected]<mailto:
> > [email protected]>>
> > > Date: Thursday, 30 October 2014 12:29
> > > To: Aleksandra Pawlik <[email protected]<mailto:
> > [email protected]>>
> > > Cc: "[email protected]<mailto:
> > [email protected]>" <
> > [email protected]<mailto:
> > [email protected]>>
> > > Subject: Re: [Discuss] Teaching very simple XML manipulation
> > >
> > > Hi Aleksandra,
> > >
> > > what sort of manipulation are you going to ask your students to do? Is
> > it just finding elements and then doing something, or is it something more
> > complex.
> > >
> > > ElementTree vs lxml is the argument you'll get into for which Python XML
> > library you're going to want to use. I can't comment on this.
> > >
> > > Strangely, I recently did a little bit of XML parsing and used
> > BeautifulSoup (normally used for web pages) as I'm more used to it, and it
> > does work (and can use ElementTree or lxml as its basic parser).
> > >
> > > Neil
> > >
> > > On 30 October 2014 11:51, Aleksandra Pawlik <
> > [email protected]<mailto:[email protected]>>
> > wrote:
> > > Hello!
> > >
> > > We are running an SWC course at Manchester as a part of a one week
> > > training for MSc students in Clinical Bioinformatics. We have 2.5 days
> > > for SWC and then the students will work on a small programming task in
> > > teams. The idea is that they will need to grab some XML files, parse
> > > them and then do some rather simple manipulation with the outputs.
> > >
> > > At the end of the SWC (we'll be using Python) we want to show them how
> > > to use a _simple_ library for XML. So before I dive into Google, I
> > > though I'd be lazy and ask the SWC community:
> > > 1) Has anyone created an SWC module on XML? If yes, can you point me to
> > it?
> > > 2) Which Python library from XML would you recommend?
> > > 3) Do you have any other suggestions?
> > >
> > > Before you jump on me saying "What the heck are you doing?Software
> > > Carpentry doesn't teach XML." I'll just say that the goal is _not_ to
> > > focus on XML. We want to show them how to use Python libraries. XML is
> > > an example and in the case of this particular audience, it is a better
> > > example than NumPy and SciPy (we had lots of discussions with prof.
> > > Andy Brass who runs the whole course). We will deliver the standard
> > > SWC but the remaining 2 days they are supposed to try flying on their
> > > own, working in groups writing a small program using what we taught
> > > them (structured programming, version control etc.)
> > >
> > > So, halp? Anyone?
> > >
> > > Many thanks.
> > > Aleksandra
> > >
> > > _______________________________________________
> > > Discuss mailing list
> > > [email protected]<mailto:
> > [email protected]>
> > >
> > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
> > >
> > >
> > >
> > >
> > > --
> > > Neil Chue Hong
> > > Director, Software Sustainability Institute
> > > EPCC, University of Edinburgh, JCMB, Edinburgh, EH9 3FD, UK
> > > Tel: +44 (0)131 650 5957
> > > http://www.software.ac.uk/
> > >
> > > LinkedIn: http://uk.linkedin.com/in/neilchuehong
> > > Twitter: http://twitter.com/npch
> > > ORCID: http://orcid.org/0000-0002-8876-7606
> >
> > _______________________________________________
> > Discuss mailing list
> > [email protected]
> >
> > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
> > _______________________________________________
> > Discuss mailing list
> > [email protected]
> >
> > http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org
> >

> _______________________________________________
> Discuss mailing list
> [email protected]
> http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org


-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: [email protected]                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*

_______________________________________________
Discuss mailing list
[email protected]
http://lists.software-carpentry.org/mailman/listinfo/discuss_lists.software-carpentry.org

Reply via email to