Re: [BangPypers] parsing xml
On Sun, 2011-07-31 at 19:57 +0530, Anand Balachandran Pillai wrote: xml parsing in the case when all that you need from the string is a simple numeric value(not a string), then good luck; unlike esr i will not use adjectives; but i would not use your code either. To be fair here, I think what he is saying is that Kenneth's problem (getting at the particular value) can be solved by using an aptly written regular expression which might be the fastest - not in terms of CPU cycles alone, but in terms of time to code it up - solution. right now I need one value - but that will probably change. -- regards Kenneth Gonsalves ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Mon, Aug 1, 2011 at 12:43 AM, Noufal Ibrahim nou...@gmail.com wrote: Dhananjay Nene dhananjay.n...@gmail.com writes: [...] re.search(distance\s*(\d+)\s*/distance,data).group(1) would appear to be the most succinct and quite fast. Adjust for whitespace as and if necessary. Whitespace (including newlines), mixed cases etc. Actually newlines are handled in the regex above. (so no longer sure why I even mentioned it), XML (assuming it is as per spec) is not mixed case. [...] As far as optimisation goes - I can see at least 3 options a. the minidom performance is acceptable - no further optimisation required b. minidom performance is not acceptable - try the regex one c. python library performance is not acceptable - switch to 'c' I'd switch b and c. If elementree is not fast enough, I'd switch to celementree and if that's not fast enough, I'd try some hand parsing. +1 Dhananjay ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On 31-Jul-2011, at 11:33 PM, Venkatraman S wrote: A regex is the simplest IMHO, because you need not know the syntax of the minidom parser. But, again i have seen this quiet often that lack of knowledge of regexp has led people to other solutions (the grapes are sour!) In the eternal words of Jamie Zawinski: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. http://regex.info/blog/2006-09-15/247 Please resist the temptation to use regexps for XML, for down that path lies only pain. It always starts with oh, only one token? Let me use a regex and get done with it, and soon enough you have a little forest of random-looking characters. Kiran -- Kiran Jonnalagadda http://jace.zaiki.in/ http://hasgeek.in/ ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Mon, Aug 1, 2011 at 1:25 PM, Kiran Jonnalagadda j...@pobox.com wrote: On 31-Jul-2011, at 11:33 PM, Venkatraman S wrote: A regex is the simplest IMHO, because you need not know the syntax of the minidom parser. But, again i have seen this quiet often that lack of knowledge of regexp has led people to other solutions (the grapes are sour!) In the eternal words of Jamie Zawinski: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. http://regex.info/blog/2006-09-15/247 I had fun reading the following quotes :) “Give a man a regular expression and he’ll match a string… teach him to make his own regular expressions and you’ve got a man with problems.” –me_da_clever_one “Give a man a regular expression and he’ll match a string… but by teaching him how to create them, you’ve given him enough rope to hang himself” – Andy Hood Please resist the temptation to use regexps for XML, for down that path lies only pain. It always starts with oh, only one token? Let me use a regex and get done with it, and soon enough you have a little forest of random-looking characters. Using regular expression to parse XML converts what is inherently hierarchical data to linear, flat data. Therein lies all its problems. Kiran -- Kiran Jonnalagadda http://jace.zaiki.in/ http://hasgeek.in/ ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
by using lxml...for example-: from lxml import etree content = etree.iterparse( *name of the xml file*, events=('start', 'end')) for event, elem in content: if elem.tag == 'distance': print elem.text Hope it will work.. On Mon, Aug 1, 2011 at 1:43 PM, Anand Balachandran Pillai abpil...@gmail.com wrote: On Mon, Aug 1, 2011 at 1:25 PM, Kiran Jonnalagadda j...@pobox.com wrote: On 31-Jul-2011, at 11:33 PM, Venkatraman S wrote: A regex is the simplest IMHO, because you need not know the syntax of the minidom parser. But, again i have seen this quiet often that lack of knowledge of regexp has led people to other solutions (the grapes are sour!) In the eternal words of Jamie Zawinski: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. http://regex.info/blog/2006-09-15/247 I had fun reading the following quotes :) “Give a man a regular expression and he’ll match a string… teach him to make his own regular expressions and you’ve got a man with problems.” –me_da_clever_one “Give a man a regular expression and he’ll match a string… but by teaching him how to create them, you’ve given him enough rope to hang himself” – Andy Hood Please resist the temptation to use regexps for XML, for down that path lies only pain. It always starts with oh, only one token? Let me use a regex and get done with it, and soon enough you have a little forest of random-looking characters. Using regular expression to parse XML converts what is inherently hierarchical data to linear, flat data. Therein lies all its problems. Kiran -- Kiran Jonnalagadda http://jace.zaiki.in/ http://hasgeek.in/ ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Mon, Aug 1, 2011 at 12:46 AM, Noufal Ibrahim nou...@gmail.com wrote: Venkatraman S venka...@gmail.com writes: Hang around in #django or #python. The most elegant code that you *should* write would invariably be pretty fast (am not ref to asm). I agree with you here. Pythonicity is best defined as what the experienced python core devs do and the stuff they use the most is optimised a lot. Pythonic python code is often the fastest python code. This is one aspect of python that I am not a fan of. Because being pythonic conflates the idioms for expression and idioms for performance. There are situations when the needs of performance overshadow the needs of expression. As an example creating classes with attributes - and setting them is more expensive than creating a dict, and writing a bigger block of sequential code is preferable (again due to performance considerations) rather than breaking it into multiple functions and especially when calling functions along with map, filter, reduce or other itertool constructs (as opposed to say list comprehensions). Other languages also have situations where one has to do such tradeoffs, but these are more in python, and especially alternative styles of expression imo get buried under the label pythonic. So yes there is a lot of importance associated with what is pythonic, but I would've felt more comfortable if these were influence by expression, rather than performance. Dhananjay ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Anand Balachandran Pillai abpil...@gmail.com writes: On Mon, Aug 1, 2011 at 6:08 AM, Anand Chitipothu anandol...@gmail.comwrote: [...] It is more subtler than that. List comprehensions are faster than map functions when the latter needs to invoke a user-defined function call or a lambda. Maps score over list comprehensions in most cases where the function is a Python built-in and when no lambda is used. Example of former: def f1(): map(sqr, range(1, 100)) ... def f2(): [sqr(x) for x in range(1, 100)] ... mytimeit.Timeit(f1) '37.91 usec/pass' mytimeit.Timeit(f2) '37.50 usec/pass' Example of latter: def f1(): map(hex, range(1, 100)) ... def f2(): [hex(x) for x in range(1, 100)] ... mytimeit.Timeit(f1) '49.41 usec/pass' mytimeit.Timeit(f2) '55.29 usec/pass' This is confusing. Why is map(sqr, range(1, 100)) faster than map(hex, range(1, 100)) Assuming sqr is implemented in python, it should be slower than hex which is implemented in C. [...] -- ~noufal http://nibrahim.net.in She used to diet on any kind of food she could lay her hands on. -- Arthur Baer, American comic and columnist ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Mon, Aug 1, 2011 at 7:51 PM, Noufal Ibrahim nou...@gmail.com wrote: Anand Balachandran Pillai abpil...@gmail.com writes: On Mon, Aug 1, 2011 at 6:08 AM, Anand Chitipothu anandol...@gmail.comwrote: [...] It is more subtler than that. List comprehensions are faster than map functions when the latter needs to invoke a user-defined function call or a lambda. Maps score over list comprehensions in most cases where the function is a Python built-in and when no lambda is used. Example of former: def f1(): map(sqr, range(1, 100)) ... def f2(): [sqr(x) for x in range(1, 100)] ... mytimeit.Timeit(f1) '37.91 usec/pass' mytimeit.Timeit(f2) '37.50 usec/pass' Example of latter: def f1(): map(hex, range(1, 100)) ... def f2(): [hex(x) for x in range(1, 100)] ... mytimeit.Timeit(f1) '49.41 usec/pass' mytimeit.Timeit(f2) '55.29 usec/pass' This is confusing. Why is map(sqr, range(1, 100)) faster than map(hex, range(1, 100)) Assuming sqr is implemented in python, it should be slower than hex which is implemented in C. Here's what I get (note: sqrt is faster than hex - not sqr) Program === from math import sqrt from timeit import Timer def sqr(x) : x * x print Simple sqr, Timer(sqr(50),from __main__ import sqr).timeit() print Simple sqrt, Timer(sqrt(50),from math import sqrt).timeit() print Simple hex, Timer(hex(50)).timeit() print Map sqr, Timer(map(sqr,range(1,100)),from __main__ import sqr).timeit() print Map sqrt,Timer(map(sqrt,range(1,100)),from math import sqrt).timeit() print Map hex, Timer(map(hex,range(1,100))).timeit() Output == Simple sqr 0.185955047607 Simple sqrt 0.108409881592 Simple hex 0.143438816071 Map sqr 21.4051530361 Map sqrt 12.3786129951 Map hex 13.8608310223 ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 4:41 PM, Venkatraman S venka...@gmail.com wrote: Noufal, I have nothing more to say than this(as i see some tangential replies which i am not interested in substantiating - for eg, i never suggested to use a regexp based parser - a regexp based xml parser is different from using 'a' regexp on a string!) : Read my replies properly. Read my assumptions properly w.r.t the xml structure and the requested value in the xml. Read the link that you have pasted again. If possible, read the comments in the link shared(from esr) again. Once done, think twice and tell me which is better. If you vouch for xml parsing in the case when all that you need from the string is a simple numeric value(not a string), then good luck; unlike esr i will not use adjectives; but i would not use your code either. To be fair here, I think what he is saying is that Kenneth's problem (getting at the particular value) can be solved by using an aptly written regular expression which might be the fastest - not in terms of CPU cycles alone, but in terms of time to code it up - solution. It is not impossible to write a regular expression which will work for bad (invalid) XML as well. Don't forget that a lot of XML/HTML parsers are actually implemented using regular expressions. You can take a look at sgmllib.SGMLParser, htmllib.HTMLParser etc. No complex text processing gets done without some kind of regular expression behind the scenes. Thanks. -V ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Anand Balachandran Pillai abpil...@gmail.com writes: On Fri, Jul 29, 2011 at 4:41 PM, Venkatraman S venka...@gmail.com wrote: [...] To be fair here, I think what he is saying is that Kenneth's problem (getting at the particular value) can be solved by using an aptly written regular expression which might be the fastest - not in terms of CPU cycles alone, but in terms of time to code it up - solution. That would depend on what one is familiar with. If all one has is a hammer... etc. Anand's minidom thing is a one liner. I understand what Venkat is saying. He's treating the data as a string rather than an XML fragment for his purposes and using a regexp to get the data out of it. [...] -- ~noufal http://nibrahim.net.in Monotheism is a gift from the gods. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Thu, Jul 28, 2011 at 3:18 PM, Kenneth Gonsalves law...@gmail.com wrote: hi, here is a simplified version of an xml file: ?xml version=1.0 encoding=UTF-8? gpx metadata author nameCloudMade/name email id=support domain=cloudmade.com / link href=http://maps.cloudmade.com;/link /author copyright author=CloudMade licensehttp://cloudmade.com/faq#license/license /copyright time2011-07-28T07:04:01/time /metadata extensions distance1489/distance time344/time startSägerstraße/start endIm Gisinger Feld/end /extensions /gpx I want to get the value of the distance element - 1489. What is the simplest way of doing this? re.search(distance\s*(\d+)\s*/distance,data).group(1) would appear to be the most succinct and quite fast. Adjust for whitespace as and if necessary. Yet I would probably use the minidom based approach, if I was sure the input was likely to be continue to be xml. Anand C's solution (elsewhere in the thread) reflects the programmers intent in a simpler, less obfuscated form (both correctly working solutions will communicate the intent with exactly the same precision - the precision required to make the program work). As far as optimisation goes - I can see at least 3 options a. the minidom performance is acceptable - no further optimisation required b. minidom performance is not acceptable - try the regex one c. python library performance is not acceptable - switch to 'c' I can imagine people starting with a and then deciding to move along the path a-b-c if and as necessary. I believe starting with b risks obfuscating code (imo regex is obfuscated compared to xml nodes - YMMV) I don't know of any python programmers who are speed-maniacs. I am worried anytime someone programs in something else than assembly/machine code and uses the latter word. The rest of us are just trading off development speed vs. runtime speed. Dhananjay ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Sun, Jul 31, 2011 at 10:58 PM, Dhananjay Nene dhananjay.n...@gmail.comwrote: a. the minidom performance is acceptable - no further optimisation required b. minidom performance is not acceptable - try the regex one c. python library performance is not acceptable - switch to 'c' I can imagine people starting with a and then deciding to move along the path a-b-c if and as necessary. I believe starting with b risks obfuscating code (imo regex is obfuscated compared to xml nodes - YMMV) A regex is the simplest IMHO, because you need not know the syntax of the minidom parser. But, again i have seen this quiet often that lack of knowledge of regexp has led people to other solutions (the grapes are sour!) I don't know of any python programmers who are speed-maniacs. I am worried anytime someone programs in something else than assembly/machine code and uses the latter word. The rest of us are just trading off development speed vs. runtime speed. Hang around in #django or #python. The most elegant code that you *should* write would invariably be pretty fast (am not ref to asm). ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Dhananjay Nene dhananjay.n...@gmail.com writes: [...] re.search(distance\s*(\d+)\s*/distance,data).group(1) would appear to be the most succinct and quite fast. Adjust for whitespace as and if necessary. Whitespace (including newlines), mixed cases etc. [...] As far as optimisation goes - I can see at least 3 options a. the minidom performance is acceptable - no further optimisation required b. minidom performance is not acceptable - try the regex one c. python library performance is not acceptable - switch to 'c' I'd switch b and c. If elementree is not fast enough, I'd switch to celementree and if that's not fast enough, I'd try some hand parsing. I can imagine people starting with a and then deciding to move along the path a-b-c if and as necessary. I believe starting with b risks obfuscating code (imo regex is obfuscated compared to xml nodes - YMMV) As someone who messed with perl for a long time, I can attest to their power an unmaintainability. I stay away from them unless I really need them. But yes, people like Larry Wall seem to think in a fundamentally different way so YMMV. [...] -- ~noufal http://nibrahim.net.in I tripped over a hole that was sticking up out of the ground. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Venkatraman S venka...@gmail.com writes: [...] A regex is the simplest IMHO, because you need not know the syntax of the minidom parser. Oh come on. This sounds like doing it the wrong way because you're not going to spend time reading the docs and then using performance as a cover for the laziness. [...] Hang around in #django or #python. The most elegant code that you *should* write would invariably be pretty fast (am not ref to asm). I agree with you here. Pythonicity is best defined as what the experienced python core devs do and the stuff they use the most is optimised a lot. Pythonic python code is often the fastest python code. [...] -- ~noufal http://nibrahim.net.in This page intentionally left blank. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Mon, Aug 1, 2011 at 6:08 AM, Anand Chitipothu anandol...@gmail.comwrote: Hang around in #django or #python. The most elegant code that you *should* write would invariably be pretty fast (am not ref to asm). That doesn't mean that any code that is faster is elegant. IIRC, in python, map function runs slightly faster than list comprehensions, but list comprehensions is considered elegant. It is more subtler than that. List comprehensions are faster than map functions when the latter needs to invoke a user-defined function call or a lambda. Maps score over list comprehensions in most cases where the function is a Python built-in and when no lambda is used. Example of former: def f1(): map(sqr, range(1, 100)) ... def f2(): [sqr(x) for x in range(1, 100)] ... mytimeit.Timeit(f1) '37.91 usec/pass' mytimeit.Timeit(f2) '37.50 usec/pass' Example of latter: def f1(): map(hex, range(1, 100)) ... def f2(): [hex(x) for x in range(1, 100)] ... mytimeit.Timeit(f1) '49.41 usec/pass' mytimeit.Timeit(f2) '55.29 usec/pass' Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- --Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Venkatraman S venka...@gmail.com writes: [...] Well, i have clearly mentioned my assumptions - i.e, when you treat the XML as a 'string' and do not want to retrieve anything else in a 'structured manner'. If the data is structured, it makes sense to exploit that structure and use a proper solution. I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. Premature optimisation is the root of all evil. I find it highly unlikely that for a large program sufferring from low performance, replacing an XML parser with a regexp based parser will significantly improve peformance. Use the right tool for the job and then if the performance is slow, profile the program. If you then find that it's the XML parsing that's the main bottleneck, switch to a different one or a C (or assembly [1]) based implementation. If it's *still* not fast enough, try moving to regexps and then measure how much speed you get out of introducing so much brittleness and fragility into your program. [...] Footnotes: [1] http://tibleiz.net/asm-xml/index.html -- ~noufal http://nibrahim.net.in Referring to a book: I read part of it all the way through. -- Samuel Goldwyn ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 11:15 AM, Anand Chitipothu anandol...@gmail.comwrote: 2011/7/29 Venkatraman S venka...@gmail.com: On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu anandol...@gmail.com wrote: 2011/7/28 Venkatraman S venka...@gmail.com: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. regexp is a bad solution to parse xml. minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. I don't think there will be significant performance difference between regexp and minidom unless you are doing it a million times. Well, i have clearly mentioned my assumptions - i.e, when you treat the XML as a 'string' and do not want to retrieve anything else in a 'structured manner'. I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. XP asks you implement the best solution with the least effort and i think in this case regexp is a winner. Thoughts can vary though. regexp can at the best be a dirty-hack, not a best solution for xml parsing. read again : i am not actually working on 'xml' (see my assumption?). ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote: Well, i have clearly mentioned my assumptions - i.e, when you treat the XML as a 'string' and do not want to retrieve anything else in a 'structured manner'. If the data is structured, it makes sense to exploit that structure and use a proper solution. yes, and xml parsing is a bad way when you are not interested in the xml structure.. I find it highly unlikely that for a large program sufferring from low performance, replacing an XML parser with a regexp based parser will significantly improve peformance. Sigh! Again, guys, i am referring to regexp when all you need is some number within a tag! If the content of that tag was text, i would have never suggested this solution. HTH. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Venkatraman S venka...@gmail.com writes: [...] Sigh! Again, guys, i am referring to regexp when all you need is some number within a tag! If the content of that tag was text, i would have never suggested this solution. [...] And I'm telling you that even a slight change to the tag - an extra space, a newline, a new attribute, a change in case or any such thing which doesn't modify it's meaning as far as the XML snippet is concerned will break your regexp and cause your program to crash. I'm asking you to consider what you're trading for this assumed increase in speed. -- ~noufal http://nibrahim.net.in The scene is dull. Tell him to put more life into his dying. -- Samuel Goldwyn ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 11:44 AM, Noufal Ibrahim nou...@gmail.com wrote: And I'm telling you that even a slight change to the tag - an extra space, a newline, a new attribute, a change in case or any such thing which doesn't modify it's meaning as far as the XML snippet is concerned will break your regexp and cause your program to crash. I'm asking you to consider what you're trading for this assumed increase in speed. Along the same lines...the problems are more when xml is concerned, for even if some other tag is malformed, then the whole document is 'gone'. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote: I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. Premature optimisation is the root of all evil. I belong to a different school. I think about performance right from the design dashboards for i think, be it a simple webapp or a financial application, the choice of your design patterns and techstack goes a long way in a good customer experience. Bulk of my thoughts are reflected in here : http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html I generally lay emphasis on 2 things when it comes to webapps : 1) Layout and 2)Speed. I am ready to sacrifice certain features; if i think a certain feature will cause some issues with customer experience(after all we are developing apps for the customer and if he is made to wait for 5s for an events, then its bad), i would rather not present it or present in some 'other' fashion(like a batched job?). ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Venkatraman S venka...@gmail.com writes: On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote: I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. Premature optimisation is the root of all evil. I belong to a different school. I think about performance right from the design dashboards for i think, be it a simple webapp or a financial application, the choice of your design patterns and techstack goes a long way in a good customer experience. Bulk of my thoughts are reflected in here : http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html I agree and I try my best to do the same thing. However, I differentiate between micro optimsations like rewriting parts in C and XML and top level optimisations like good design and the right data structures. The former, I don't do because I get bogged down by the details and end up delivering something that's super fast *really* late. The latter, I do because otherwise, the application is unusable and a bad experience. Also, micro optimising (e.g. replacing DOM parsing with regexps to extract stuff out of an XML message) makes code more brittle which is also a no win for the customer. I end up messing with the former only when I've exhausted all other avenues and *really* need that last drop of juice. This is usually common in games and stuff like that with continous involved user interaction rather than in webapps where it's a little more spaced out. If performance is *this* important to you, why don't you code your entire application in assembly hand crafting it for a certain processor, amount of memory and hard disk platter speed? Why use Python at all? The reason is because Python is fast enough for most things. You can get better performance moving to lower level routines but it's often not necessary and the costs it entails are usually not worth it. Better a fast enough stable app than a super fast one that occasionally segfaults and loses data. That's the point I'm trying to make. [...] -- ~noufal http://nibrahim.net.in If I could drop dead right now, I'd be the happiest man alive! -- Samuel Goldwyn ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
+1. On Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote: Venkatraman S venka...@gmail.com writes: On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote: I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. Premature optimisation is the root of all evil. I belong to a different school. I think about performance right from the design dashboards for i think, be it a simple webapp or a financial application, the choice of your design patterns and techstack goes a long way in a good customer experience. Bulk of my thoughts are reflected in here : http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html I agree and I try my best to do the same thing. However, I differentiate between micro optimsations like rewriting parts in C and XML and top level optimisations like good design and the right data structures. The former, I don't do because I get bogged down by the details and end up delivering something that's super fast *really* late. The latter, I do because otherwise, the application is unusable and a bad experience. Also, micro optimising (e.g. replacing DOM parsing with regexps to extract stuff out of an XML message) makes code more brittle which is also a no win for the customer. I end up messing with the former only when I've exhausted all other avenues and *really* need that last drop of juice. This is usually common in games and stuff like that with continous involved user interaction rather than in webapps where it's a little more spaced out. If performance is *this* important to you, why don't you code your entire application in assembly hand crafting it for a certain processor, amount of memory and hard disk platter speed? Why use Python at all? The reason is because Python is fast enough for most things. You can get better performance moving to lower level routines but it's often not necessary and the costs it entails are usually not worth it. Better a fast enough stable app than a super fast one that occasionally segfaults and loses data. That's the point I'm trying to make. [...] -- ~noufal http://nibrahim.net.in If I could drop dead right now, I'd be the happiest man alive! -- Samuel Goldwyn ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
+1 On Fri, Jul 29, 2011 at 12:50 PM, Sidu Ponnappa lorddae...@gmail.comwrote: +1. On Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote: Venkatraman S venka...@gmail.com writes: On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote: I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. Premature optimisation is the root of all evil. I belong to a different school. I think about performance right from the design dashboards for i think, be it a simple webapp or a financial application, the choice of your design patterns and techstack goes a long way in a good customer experience. Bulk of my thoughts are reflected in here : http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html I agree and I try my best to do the same thing. However, I differentiate between micro optimsations like rewriting parts in C and XML and top level optimisations like good design and the right data structures. The former, I don't do because I get bogged down by the details and end up delivering something that's super fast *really* late. The latter, I do because otherwise, the application is unusable and a bad experience. Also, micro optimising (e.g. replacing DOM parsing with regexps to extract stuff out of an XML message) makes code more brittle which is also a no win for the customer. I end up messing with the former only when I've exhausted all other avenues and *really* need that last drop of juice. This is usually common in games and stuff like that with continous involved user interaction rather than in webapps where it's a little more spaced out. If performance is *this* important to you, why don't you code your entire application in assembly hand crafting it for a certain processor, amount of memory and hard disk platter speed? Why use Python at all? The reason is because Python is fast enough for most things. You can get better performance moving to lower level routines but it's often not necessary and the costs it entails are usually not worth it. Better a fast enough stable app than a super fast one that occasionally segfaults and loses data. That's the point I'm trying to make. [...] -- ~noufal http://nibrahim.net.in If I could drop dead right now, I'd be the happiest man alive! -- Samuel Goldwyn ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
n Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote: I agree and I try my best to do the same thing. However, I differentiate between micro optimsations like rewriting parts in C and XML and top level optimisations like good design and the right data structures. Using regexp is micro optimization? The former, I don't do because I get bogged down by the details and end up delivering something that's super fast *really* late. The latter, I do because otherwise, the application is unusable and a bad experience. Also, micro optimising (e.g. replacing DOM parsing with regexps to extract stuff out of an XML message) makes code more brittle which is also a no win for the customer. IMHO, regexps are much more powerful and fault tolerant than XML parsing. XMLs are brittle. If performance is *this* important to you, why don't you code your entire application in assembly hand crafting it for a certain processor, amount of memory and hard disk platter speed? Why use Python at all? The reason is because Python is fast enough for most things. You can get better performance moving to lower level routines but it's often not necessary and the costs it entails are usually not worth it. Better a fast enough stable app than a super fast one that occasionally segfaults and loses data. Not sure how this point is relevant; the amount of performance you need is dependant on the nature of application you develop. For eg, see this presentation for how simple hacks like 'compressing' dom 'getters/setters' can affect browser performance. http://paulirish.com/2011/dom-html5-css3-performance/ At the same time, look at this TED talk wherein trading applications spend billions of $ for milliseconds performance gains http://www.youtube.com/watch?v=TDaFwnOiKVE For a webapp, XML parsing is very important factor that the developer *must* consider while designing. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Venkatraman S venka...@gmail.com writes: n Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote: I agree and I try my best to do the same thing. However, I differentiate between micro optimsations like rewriting parts in C and XML and top level optimisations like good design and the right data structures. Using regexp is micro optimization? For parsing XML, yes it is. It's not the first thing I'd use and it's something I'd consider only after I've exhausted everything else and have reason to believe that my application is not fast enough just because I'm using an XML parser instead of a regexp. There are some places where I would use regexps instead of a parser upfront though. Mostly related to streams of bad XML data (particularly while screen scraping) but even then, a fault tolerant parser would do better than regexps. [...] IMHO, regexps are much more powerful and fault tolerant than XML parsing. XMLs are brittle. If you say so. I don't have much more to say on this. There was an interesting exchange a while ago between ERic Raymond and John Graham-Cumming on using regexps. vs. a regular parser while screenscraping to fetch data out of a forge site. Here's the link. You might find it interesting http://blog.jgc.org/2009/11/parsing-html-in-python-with.html#links If performance is *this* important to you, why don't you code your entire application in assembly hand crafting it for a certain processor, amount of memory and hard disk platter speed? Why use Python at all? The reason is because Python is fast enough for most things. You can get better performance moving to lower level routines but it's often not necessary and the costs it entails are usually not worth it. Better a fast enough stable app than a super fast one that occasionally segfaults and loses data. Not sure how this point is relevant; the amount of performance you need is dependant on the nature of application you develop. Yup. And I put it to you that switching from a regular XML parser to a regexp based one will not give you a sufficient speed boost to justify the higher maintenance costs in most cases. For a webapp, XML parsing is very important factor that the developer *must* consider while designing. And your advice is to use regexps to do this? [...] -- ~noufal http://nibrahim.net.in A verbal contract isn't worth the paper it's written on. Include me out. -Samuel Goldwyn ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 1:09 PM, Venkatraman S venka...@gmail.com wrote: IMHO, regexps are much more powerful and fault tolerant than XML parsing. XMLs are brittle. Did you mean parsing XML using Regular Expressions is more powerful and fault tolerant than using a XML parser? Regards, BG -- Baishampayan Ghose b.ghose at gmail.com ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 10:55 AM, Baishampayan Ghose b.gh...@gmail.com wrote: minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. FWIW, ElementTree is a part of the standard library as well and is known to be much better than minidom in various ways. Being part of the standard library is a big plus. However , as compared to lxml, one thing that seemed to be missing in more standard Python XML libraries is XSLT. This is incredibly useful in some contexts. I would love to learn about XSLT support in other libraries. Here is the blog post about the performance of Python XML libraries that I was referring to earlier: http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ Regards, Gora ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Noufal, I have nothing more to say than this(as i see some tangential replies which i am not interested in substantiating - for eg, i never suggested to use a regexp based parser - a regexp based xml parser is different from using 'a' regexp on a string!) : Read my replies properly. Read my assumptions properly w.r.t the xml structure and the requested value in the xml. Read the link that you have pasted again. If possible, read the comments in the link shared(from esr) again. Once done, think twice and tell me which is better. If you vouch for xml parsing in the case when all that you need from the string is a simple numeric value(not a string), then good luck; unlike esr i will not use adjectives; but i would not use your code either. Thanks. -V ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Venkatraman S venka...@gmail.com writes: [...] Read my replies properly. Read my assumptions properly w.r.t the xml structure and the requested value in the xml. Read the link that you have pasted again. If possible, read the comments in the link shared(from esr) again. Once done, think twice and tell me which is better. If you vouch for xml parsing in the case when all that you need from the string is a simple numeric value(not a string), then good luck; unlike esr i will not use adjectives; but i would not use your code either. [...] Fair enough. -- ~noufal http://nibrahim.net.in Professional certification for car people may sound like an oxymoron. -The Wall Street Journal, page B1, Tuesday, July 17, 1990. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
grep or regexp? -V ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Using xpath such as: /gpx/extensions/distance(:text) ? On Thu, Jul 28, 2011 at 3:20 PM, Venkatraman S venka...@gmail.com wrote: grep or regexp? -V ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org http://twitter.com/gnufied ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Thu, Jul 28, 2011 at 3:20 PM, Venkatraman S venka...@gmail.com wrote: grep or regexp? -V ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers can write an Xml parsing query -- Ramdas S +91 9342 583 065 ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
2011/7/28 Kenneth Gonsalves law...@gmail.com: hi, here is a simplified version of an xml file: ?xml version=1.0 encoding=UTF-8? gpx metadata author nameCloudMade/name email id=support domain=cloudmade.com / link href=http://maps.cloudmade.com;/link /author copyright author=CloudMade licensehttp://cloudmade.com/faq#license/license /copyright time2011-07-28T07:04:01/time /metadata extensions distance1489/distance time344/time startSägerstraße/start endIm Gisinger Feld/end /extensions /gpx I want to get the value of the distance element - 1489. What is the simplest way of doing this? from xml.dom import minidom dom = minidom.parseString(x) dom.getElementsByTagName(distance)[0].childNodes[0].nodeValue u'1489' Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
here is a simplified version of an xml file: ?xml version=1.0 encoding=UTF-8? gpx metadata author nameCloudMade/name email id=support domain=cloudmade.com / link href=http://maps.cloudmade.com;/link /author copyright author=CloudMade licensehttp://cloudmade.com/faq#license/license /copyright time2011-07-28T07:04:01/time /metadata extensions distance1489/distance time344/time startSägerstraße/start endIm Gisinger Feld/end /extensions /gpx I want to get the value of the distance element - 1489. What is the simplest way of doing this? #!/usr/bin/env python # -*- coding: utf-8 -*- from xml.etree.ElementTree import fromstring data = ?xml version=1.0 encoding=UTF-8? gpx metadata author nameCloudMade/name email id=support domain=cloudmade.com / link href=http://maps.cloudmade.com;/link /author copyright author=CloudMade licensehttp://cloudmade.com/faq#license/license /copyright time2011-07-28T07:04:01/time /metadata extensions distance1489/distance time344/time startSägerstraße/start endIm Gisinger Feld/end /extensions /gpx def parse_xml(s): element = fromstring(s) return element.find(extensions/distance).text if __name__ == __main__: print parse_xml(data) Hope that helps. Regards, BG -- Baishampayan Ghose b.ghose at gmail.com ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Thu, 2011-07-28 at 15:33 +0530, Anand Chitipothu wrote: I want to get the value of the distance element - 1489. What is the simplest way of doing this? from xml.dom import minidom dom = minidom.parseString(x) dom.getElementsByTagName(distance)[0].childNodes[0].nodeValue u'1489' thanks - perfect. -- regards Kenneth Gonsalves ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
You can try beautifulsoup, recommended for python/XML Parsing. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
If you're doing this repeatedly, you may want to just delegate to a native XPath implementation. I haven't done much Python, so I can't comment on your choices, but in Ruby I'd simply hand off to libXML using Nokogiri. This approach should be a whole lot faster, but I'd advise benchmarking first because, as I said, I know little about Python. Best, Sidu. http://sidu.in On Thu, Jul 28, 2011 at 4:01 PM, Kenneth Gonsalves law...@gmail.com wrote: On Thu, 2011-07-28 at 15:33 +0530, Anand Chitipothu wrote: I want to get the value of the distance element - 1489. What is the simplest way of doing this? from xml.dom import minidom dom = minidom.parseString(x) dom.getElementsByTagName(distance)[0].childNodes[0].nodeValue u'1489' thanks - perfect. -- regards Kenneth Gonsalves ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S venka...@gmail.com wrote: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. [...] Strongly disagree. IMHO, regexps are the wrong solution for parsing XML (or, any kind of well-structured text), as they end up becoming intolerably complex, and do not degrade gracefully for broken XML. Have not compared speeds myself, but there are blogs that go into that. In my experience, the cleanest, most efficient, and richest-in-features Python XML library is lxml. For people used to BeautifulSoup, lxml has a BeautifulSoup parser, and is significantly more efficient. Regards, Gora ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 1:23 AM, Gora Mohanty g...@mimirtech.com wrote: On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S venka...@gmail.com wrote: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. [...] Strongly disagree. IMHO, regexps are the wrong solution for parsing XML (or, any kind of well-structured text), as they end up becoming intolerably complex, and do not degrade gracefully for broken XML. Have not compared speeds myself, but there are blogs that go into that. In my experience, the cleanest, most efficient, and richest-in-features Python XML library is lxml. For people used to BeautifulSoup, lxml has a BeautifulSoup parser, and is significantly more efficient. If it's a questions of the fastest gun around it must be cElementTree, and please refer the table somewhere towards bottom of the page. Caveat, the page belongs to effbot who is written the package. http://effbot.org/zone/celementtree.htm Regards, Gora ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- Ramdas S +91 9342 583 065 ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
Hi, check and try pyparsing module... U could do it so simple:) regards, joseph On 7/29/11, Ramdas S ram...@gmail.com wrote: On Fri, Jul 29, 2011 at 1:23 AM, Gora Mohanty g...@mimirtech.com wrote: On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S venka...@gmail.com wrote: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. [...] Strongly disagree. IMHO, regexps are the wrong solution for parsing XML (or, any kind of well-structured text), as they end up becoming intolerably complex, and do not degrade gracefully for broken XML. Have not compared speeds myself, but there are blogs that go into that. In my experience, the cleanest, most efficient, and richest-in-features Python XML library is lxml. For people used to BeautifulSoup, lxml has a BeautifulSoup parser, and is significantly more efficient. If it's a questions of the fastest gun around it must be cElementTree, and please refer the table somewhere towards bottom of the page. Caveat, the page belongs to effbot who is written the package. http://effbot.org/zone/celementtree.htm Regards, Gora ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- Ramdas S +91 9342 583 065 ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers -- Sent from my mobile device ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
2011/7/28 Venkatraman S venka...@gmail.com: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. regexp is a bad solution to parse xml. minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. I don't think there will be significant performance difference between regexp and minidom unless you are doing it a million times. Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. FWIW, ElementTree is a part of the standard library as well and is known to be much better than minidom in various ways. Regards, BG -- Baishampayan Ghose b.ghose at gmail.com ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu anandol...@gmail.comwrote: 2011/7/28 Venkatraman S venka...@gmail.com: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. regexp is a bad solution to parse xml. minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. I don't think there will be significant performance difference between regexp and minidom unless you are doing it a million times. Well, i have clearly mentioned my assumptions - i.e, when you treat the XML as a 'string' and do not want to retrieve anything else in a 'structured manner'. I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. XP asks you implement the best solution with the least effort and i think in this case regexp is a winner. Thoughts can vary though. ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
2011/7/29 Venkatraman S venka...@gmail.com: On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu anandol...@gmail.comwrote: 2011/7/28 Venkatraman S venka...@gmail.com: parsing using minidom is one of the slowest. if you just want to extract the distance and assuming that it(the tag) will always be consistent, then i would always suggest regexp. xml parsing is a pain. regexp is a bad solution to parse xml. minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. I don't think there will be significant performance difference between regexp and minidom unless you are doing it a million times. Well, i have clearly mentioned my assumptions - i.e, when you treat the XML as a 'string' and do not want to retrieve anything else in a 'structured manner'. I am a speed-maniac and crave for speed; so if the assumption is valid, i can vouch for the fact that regexp would be faster and neater solution. I have done some speed experiments in past on this (results of which i do not have handy), and i found this. XP asks you implement the best solution with the least effort and i think in this case regexp is a winner. Thoughts can vary though. regexp can at the best be a dirty-hack, not a best solution for xml parsing. Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers
Re: [BangPypers] parsing xml
2011/7/29 Baishampayan Ghose b.gh...@gmail.com: minidom is the fastest solution if you consider the programmer time instead of developer time. Minidom is available in standard library, you don't have to add another dependency and worry about PyPI downtimes and lxml compilations failures. FWIW, ElementTree is a part of the standard library as well and is known to be much better than minidom in various ways. New in version 2.5. I don't do much xml parsing and I've always been using minidom when I needed it. I think I should update my knowledge. Anand ___ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers