Re: [BangPypers] parsing xml

2011-08-01 Thread Kenneth Gonsalves
On Sun, 2011-07-31 at 19:57 +0530, Anand Balachandran Pillai wrote:
  xml parsing in the case when all that you need from the string is a
 simple
  numeric value(not a string), then good luck; unlike esr i will not
 use
  adjectives; but i would not use your code either.
 
 
 To be fair here, I think what he is saying is that Kenneth's problem
 (getting
 at the particular value) can be solved by using an aptly written
 regular
 expression which might be the fastest - not in terms of CPU cycles
 alone,
 but in terms of time to code it up - solution. 

right now I need one value - but that will probably change.
-- 
regards
Kenneth Gonsalves

___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Dhananjay Nene
On Mon, Aug 1, 2011 at 12:43 AM, Noufal Ibrahim nou...@gmail.com wrote:

 Dhananjay Nene dhananjay.n...@gmail.com writes:


 [...]

  re.search(distance\s*(\d+)\s*/distance,data).group(1)
 
  would appear to be the most succinct and quite fast. Adjust for
 whitespace
  as and if necessary.

 Whitespace (including newlines), mixed cases etc.

 Actually newlines are handled in the regex above. (so no longer sure why I
even mentioned it), XML (assuming it is as per spec) is not mixed case.


 [...]

  As far as optimisation goes - I can see at least 3 options
 
  a. the minidom performance is acceptable - no further optimisation
 required
  b. minidom performance is not acceptable - try the regex one
  c. python library performance is not acceptable - switch to 'c'

 I'd switch b and c. If elementree is not fast enough, I'd switch to
 celementree and if that's not fast enough, I'd try some hand parsing.

+1

Dhananjay
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Kiran Jonnalagadda
On 31-Jul-2011, at 11:33 PM, Venkatraman S wrote:

 A regex is the simplest IMHO, because you need not know the syntax of the
 minidom parser.
 But, again i have seen this quiet often that lack of knowledge of regexp has
 led people to other solutions (the grapes are sour!)

In the eternal words of Jamie Zawinski:


Some people, when confronted with a problem, think 
“I know, I'll use regular expressions.” Now they have two problems.


http://regex.info/blog/2006-09-15/247

Please resist the temptation to use regexps for XML, for down that path lies 
only pain. It always starts with oh, only one token? Let me use a regex and 
get done with it, and soon enough you have a little forest of random-looking 
characters.

Kiran

-- 
Kiran Jonnalagadda
http://jace.zaiki.in/
http://hasgeek.in/


___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Anand Balachandran Pillai
On Mon, Aug 1, 2011 at 1:25 PM, Kiran Jonnalagadda j...@pobox.com wrote:

 On 31-Jul-2011, at 11:33 PM, Venkatraman S wrote:

  A regex is the simplest IMHO, because you need not know the syntax of the
  minidom parser.
  But, again i have seen this quiet often that lack of knowledge of regexp
 has
  led people to other solutions (the grapes are sour!)

 In the eternal words of Jamie Zawinski:

 
 Some people, when confronted with a problem, think
 “I know, I'll use regular expressions.” Now they have two problems.
 

 http://regex.info/blog/2006-09-15/247


I had fun reading the following quotes :)

“Give a man a regular expression and he’ll match a string…
teach him to make his own regular expressions and you’ve got a man with
problems.”
–me_da_clever_one

“Give a man a regular expression and he’ll match a string… but by teaching
him how to create them, you’ve given him enough rope to hang himself” – Andy
Hood



 Please resist the temptation to use regexps for XML, for down that path
 lies only pain. It always starts with oh, only one token? Let me use a
 regex and get done with it, and soon enough you have a little forest of
 random-looking characters.


Using regular expression to parse XML converts what is inherently
hierarchical data to linear, flat data. Therein lies all its problems.



 Kiran

 --
 Kiran Jonnalagadda
 http://jace.zaiki.in/
 http://hasgeek.in/


 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Smrutilekha Swain
by using lxml...for example-:

from lxml import etree

content = etree.iterparse( *name of the xml file*, events=('start', 'end'))
for event, elem in content:
  if elem.tag == 'distance':
   print elem.text


Hope it will work..


On Mon, Aug 1, 2011 at 1:43 PM, Anand Balachandran Pillai 
abpil...@gmail.com wrote:

 On Mon, Aug 1, 2011 at 1:25 PM, Kiran Jonnalagadda j...@pobox.com wrote:

  On 31-Jul-2011, at 11:33 PM, Venkatraman S wrote:
 
   A regex is the simplest IMHO, because you need not know the syntax of
 the
   minidom parser.
   But, again i have seen this quiet often that lack of knowledge of
 regexp
  has
   led people to other solutions (the grapes are sour!)
 
  In the eternal words of Jamie Zawinski:
 
  
  Some people, when confronted with a problem, think
  “I know, I'll use regular expressions.” Now they have two problems.
  
 
  http://regex.info/blog/2006-09-15/247
 

 I had fun reading the following quotes :)

 “Give a man a regular expression and he’ll match a string…
 teach him to make his own regular expressions and you’ve got a man with
 problems.”
 –me_da_clever_one

 “Give a man a regular expression and he’ll match a string… but by teaching
 him how to create them, you’ve given him enough rope to hang himself” –
 Andy
 Hood


 
  Please resist the temptation to use regexps for XML, for down that path
  lies only pain. It always starts with oh, only one token? Let me use a
  regex and get done with it, and soon enough you have a little forest of
  random-looking characters.
 

 Using regular expression to parse XML converts what is inherently
 hierarchical data to linear, flat data. Therein lies all its problems.


 
  Kiran
 
  --
  Kiran Jonnalagadda
  http://jace.zaiki.in/
  http://hasgeek.in/
 
 
  ___
  BangPypers mailing list
  BangPypers@python.org
  http://mail.python.org/mailman/listinfo/bangpypers
 



 --
 --Anand
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers

___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Dhananjay Nene
On Mon, Aug 1, 2011 at 12:46 AM, Noufal Ibrahim nou...@gmail.com wrote:
 Venkatraman S venka...@gmail.com writes:


 Hang around in #django or #python. The most elegant code that you
 *should* write would invariably be pretty fast (am not ref to asm).

 I agree with you here. Pythonicity is best defined as what the
 experienced python core devs do and the stuff they use the most is
 optimised a lot. Pythonic python code is often the fastest python code.

This is one aspect of python that I am not a fan of. Because being
pythonic conflates the idioms for expression and idioms for
performance. There are situations when the needs of performance
overshadow the needs of expression. As an example creating classes
with attributes - and setting them is more expensive than creating a
dict, and writing a bigger block of sequential code is preferable
(again due to performance considerations) rather than breaking it into
multiple functions and especially when calling functions along with
map, filter, reduce or other itertool constructs (as opposed to say
list comprehensions).

Other languages also have situations where one has to do such
tradeoffs, but these are more in python, and especially alternative
styles of expression imo get buried under the label pythonic. So yes
there is a lot of importance associated with what is pythonic, but I
would've felt more comfortable if these were influence by expression,
rather than performance.

Dhananjay
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Noufal Ibrahim
Anand Balachandran Pillai abpil...@gmail.com writes:

 On Mon, Aug 1, 2011 at 6:08 AM, Anand Chitipothu anandol...@gmail.comwrote:

[...]

 It is more subtler than that.

 List comprehensions are faster than map functions when
 the latter needs to invoke a user-defined function call or a lambda.

 Maps score over list comprehensions in most cases where the function
 is a Python built-in and when no lambda is used.

 Example of former:

 def f1(): map(sqr, range(1, 100))
 ...
 def f2(): [sqr(x) for x in range(1, 100)]
 ...
 mytimeit.Timeit(f1)
 '37.91 usec/pass'
 mytimeit.Timeit(f2)
 '37.50 usec/pass'

 Example of latter:

 def f1(): map(hex, range(1, 100))
 ...
 def f2(): [hex(x) for x in range(1, 100)]
 ...
 mytimeit.Timeit(f1)
 '49.41 usec/pass'
 mytimeit.Timeit(f2)
 '55.29 usec/pass'

This is confusing. Why is 

map(sqr, range(1, 100))

faster than

map(hex, range(1, 100))

Assuming sqr is implemented in python, it should be slower than hex
which is implemented in C. 

[...]




-- 
~noufal
http://nibrahim.net.in

She used to diet on any kind of food she could lay her hands on. -- Arthur 
Baer, American comic and columnist
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-08-01 Thread Dhananjay Nene
On Mon, Aug 1, 2011 at 7:51 PM, Noufal Ibrahim nou...@gmail.com wrote:
 Anand Balachandran Pillai abpil...@gmail.com writes:

 On Mon, Aug 1, 2011 at 6:08 AM, Anand Chitipothu anandol...@gmail.comwrote:

 [...]

 It is more subtler than that.

 List comprehensions are faster than map functions when
 the latter needs to invoke a user-defined function call or a lambda.

 Maps score over list comprehensions in most cases where the function
 is a Python built-in and when no lambda is used.

 Example of former:

 def f1(): map(sqr, range(1, 100))
 ...
 def f2(): [sqr(x) for x in range(1, 100)]
 ...
 mytimeit.Timeit(f1)
 '37.91 usec/pass'
 mytimeit.Timeit(f2)
 '37.50 usec/pass'

 Example of latter:

 def f1(): map(hex, range(1, 100))
 ...
 def f2(): [hex(x) for x in range(1, 100)]
 ...
 mytimeit.Timeit(f1)
 '49.41 usec/pass'
 mytimeit.Timeit(f2)
 '55.29 usec/pass'

 This is confusing. Why is

 map(sqr, range(1, 100))

 faster than

 map(hex, range(1, 100))

 Assuming sqr is implemented in python, it should be slower than hex
 which is implemented in C.
Here's what I get (note: sqrt is faster than hex - not sqr)

Program
===
from math import sqrt
from timeit import Timer
def sqr(x) : x * x
print Simple sqr,  Timer(sqr(50),from __main__ import sqr).timeit()
print Simple sqrt, Timer(sqrt(50),from math import sqrt).timeit()
print Simple hex,  Timer(hex(50)).timeit()
print Map sqr, Timer(map(sqr,range(1,100)),from __main__
import sqr).timeit()
print Map sqrt,Timer(map(sqrt,range(1,100)),from math import
sqrt).timeit()
print Map hex, Timer(map(hex,range(1,100))).timeit()

Output
==
Simple sqr 0.185955047607
Simple sqrt 0.108409881592
Simple hex 0.143438816071
Map sqr 21.4051530361
Map sqrt 12.3786129951
Map hex 13.8608310223
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Anand Balachandran Pillai
On Fri, Jul 29, 2011 at 4:41 PM, Venkatraman S venka...@gmail.com wrote:

 Noufal,

 I have nothing more to say than this(as i see some tangential replies which
 i am not interested in substantiating - for eg, i never suggested to use a
 regexp based parser - a regexp based xml parser is different from using 'a'
 regexp on a string!) :

 Read my replies properly. Read my assumptions properly w.r.t the xml
 structure and the requested value in the xml.  Read the link that you have
 pasted again. If possible, read the comments in the link shared(from esr)
 again.  Once done, think twice and tell me which is better. If you vouch
 for
 xml parsing in the case when all that you need from the string is a simple
 numeric value(not a string), then good luck; unlike esr i will not use
 adjectives; but i would not use your code either.


To be fair here, I think what he is saying is that Kenneth's problem
(getting
at the particular value) can be solved by using an aptly written regular
expression which might be the fastest - not in terms of CPU cycles alone,
but in terms of time to code it up - solution.

It is not impossible to write a regular expression which will work for
bad (invalid) XML as well.

Don't forget that a lot of XML/HTML parsers are actually implemented
using regular expressions. You can take a look at sgmllib.SGMLParser,
htmllib.HTMLParser etc.

No complex text processing gets done without some kind of regular expression
behind the scenes.




 Thanks.

 -V
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Noufal Ibrahim
Anand Balachandran Pillai abpil...@gmail.com writes:

 On Fri, Jul 29, 2011 at 4:41 PM, Venkatraman S venka...@gmail.com wrote:

[...]

 To be fair here, I think what he is saying is that Kenneth's problem
 (getting at the particular value) can be solved by using an aptly
 written regular expression which might be the fastest - not in terms
 of CPU cycles alone, but in terms of time to code it up - solution.

That would depend on what one is familiar with. If all one has is a
hammer... etc.

Anand's minidom thing is a one liner. 

I understand what Venkat is saying. He's treating the data as a string
rather than an XML fragment for his purposes and using a regexp to get
the data out of it. 


[...]


-- 
~noufal
http://nibrahim.net.in

Monotheism is a gift from the gods.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Dhananjay Nene
On Thu, Jul 28, 2011 at 3:18 PM, Kenneth Gonsalves law...@gmail.com wrote:

 hi,

 here is a simplified version of an xml file:

 ?xml version=1.0 encoding=UTF-8?
gpx 
metadata
author
nameCloudMade/name
email id=support domain=cloudmade.com /
link href=http://maps.cloudmade.com;/link
/author
copyright author=CloudMade
licensehttp://cloudmade.com/faq#license/license
/copyright
time2011-07-28T07:04:01/time
/metadata
extensions
distance1489/distance
time344/time
startSägerstraße/start
endIm Gisinger Feld/end
/extensions
/gpx

 I want to get the value of the distance element - 1489. What is the
 simplest way of doing this?


re.search(distance\s*(\d+)\s*/distance,data).group(1)

would appear to be the most succinct and quite fast. Adjust for whitespace
as and if necessary.

Yet I would probably use the minidom based approach, if I was sure the input
was likely to be continue to be xml. Anand C's solution (elsewhere in the
thread) reflects the programmers intent in a simpler, less obfuscated form
(both correctly working solutions will communicate the intent with exactly
the same precision - the precision required to make the program work).

As far as optimisation goes - I can see at least 3 options

a. the minidom performance is acceptable - no further optimisation required
b. minidom performance is not acceptable - try the regex one
c. python library performance is not acceptable - switch to 'c'

I can imagine people starting with a and then deciding to move along the
path a-b-c if and as necessary.
I believe starting with b risks obfuscating code (imo regex is obfuscated
compared to xml nodes - YMMV)
I don't know of any python programmers who are speed-maniacs. I am worried
anytime someone programs in something else than assembly/machine code and
uses the latter word. The rest of us are just trading off development speed
vs. runtime speed.

Dhananjay
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Venkatraman S
On Sun, Jul 31, 2011 at 10:58 PM, Dhananjay Nene
dhananjay.n...@gmail.comwrote:

 a. the minidom performance is acceptable - no further optimisation required
 b. minidom performance is not acceptable - try the regex one
 c. python library performance is not acceptable - switch to 'c'

 I can imagine people starting with a and then deciding to move along the
 path a-b-c if and as necessary.
 I believe starting with b risks obfuscating code (imo regex is obfuscated
 compared to xml nodes - YMMV)


A regex is the simplest IMHO, because you need not know the syntax of the
minidom parser.
But, again i have seen this quiet often that lack of knowledge of regexp has
led people to other solutions (the grapes are sour!)


 I don't know of any python programmers who are speed-maniacs. I am worried
 anytime someone programs in something else than assembly/machine code and
 uses the latter word. The rest of us are just trading off development speed
 vs. runtime speed.


Hang around in #django or #python. The most elegant code that you *should*
write would invariably be pretty fast (am not ref to asm).
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Noufal Ibrahim
Dhananjay Nene dhananjay.n...@gmail.com writes:


[...]

 re.search(distance\s*(\d+)\s*/distance,data).group(1)

 would appear to be the most succinct and quite fast. Adjust for whitespace
 as and if necessary.

Whitespace (including newlines), mixed cases etc. 


[...]

 As far as optimisation goes - I can see at least 3 options

 a. the minidom performance is acceptable - no further optimisation required
 b. minidom performance is not acceptable - try the regex one
 c. python library performance is not acceptable - switch to 'c'

I'd switch b and c. If elementree is not fast enough, I'd switch to
celementree and if that's not fast enough, I'd try some hand parsing.

 I can imagine people starting with a and then deciding to move along
 the path a-b-c if and as necessary.  I believe starting with b risks
 obfuscating code (imo regex is obfuscated compared to xml nodes -
 YMMV)

As someone who messed with perl for a long time, I can attest to their
power an unmaintainability. I stay away from them unless I really need
them. But yes, people like Larry Wall seem to think in a fundamentally
different way so YMMV.


[...]


-- 
~noufal
http://nibrahim.net.in

I tripped over a hole that was sticking up out of the ground.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Noufal Ibrahim
Venkatraman S venka...@gmail.com writes:

[...]


 A regex is the simplest IMHO, because you need not know the syntax of the
 minidom parser.

Oh come on. This sounds like doing it the wrong way because you're not
going to spend time reading the docs and then using performance as a
cover for the laziness. 

[...]

 Hang around in #django or #python. The most elegant code that you
 *should* write would invariably be pretty fast (am not ref to asm).

I agree with you here. Pythonicity is best defined as what the
experienced python core devs do and the stuff they use the most is
optimised a lot. Pythonic python code is often the fastest python code.



[...]



-- 
~noufal
http://nibrahim.net.in

This page intentionally left blank.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-31 Thread Anand Balachandran Pillai
On Mon, Aug 1, 2011 at 6:08 AM, Anand Chitipothu anandol...@gmail.comwrote:

  Hang around in #django or #python. The most elegant code that you
 *should*
  write would invariably be pretty fast (am not ref to asm).

 That doesn't mean that any code that is faster is elegant.

 IIRC, in python, map function runs slightly faster than list
 comprehensions, but list comprehensions is considered elegant.


It is more subtler than that.

List comprehensions are faster than map functions when
the latter needs to invoke a user-defined function call or a lambda.

Maps score over list comprehensions in most cases where the function
is a Python built-in and when no lambda is used.

Example of former:

 def f1(): map(sqr, range(1, 100))
...
 def f2(): [sqr(x) for x in range(1, 100)]
...
 mytimeit.Timeit(f1)
'37.91 usec/pass'
 mytimeit.Timeit(f2)
'37.50 usec/pass'

Example of latter:

 def f1(): map(hex, range(1, 100))
...
 def f2(): [hex(x) for x in range(1, 100)]
...
 mytimeit.Timeit(f1)
'49.41 usec/pass'
 mytimeit.Timeit(f2)
'55.29 usec/pass'






 Anand
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 
--Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Noufal Ibrahim
Venkatraman S venka...@gmail.com writes:


[...]

 Well, i have clearly mentioned my assumptions - i.e, when you treat
 the XML as a 'string' and do not want to retrieve anything else in a
 'structured manner'.

If the data is structured, it makes sense to exploit that structure and
use a proper solution. 

 I am a speed-maniac and crave for speed; so if the assumption is
 valid, i can vouch for the fact that regexp would be faster and neater
 solution. I have done some speed experiments in past on this (results
 of which i do not have handy), and i found this.

Premature optimisation is the root of all evil.

I find it highly unlikely that for a large program sufferring from low
performance, replacing an XML parser with a regexp based parser will
significantly improve peformance. 

Use the right tool for the job and then if the performance is slow,
profile the program. If you then find that it's the XML parsing that's
the main bottleneck, switch to a different one or a C (or assembly [1])
based implementation. If it's *still* not fast enough, try moving to
regexps and then measure how much speed you get out of introducing so
much brittleness and fragility into your program.


[...]




Footnotes: 
[1]  http://tibleiz.net/asm-xml/index.html

-- 
~noufal
http://nibrahim.net.in

Referring to a book: I read part of it all the way through. -- Samuel Goldwyn
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Venkatraman S
On Fri, Jul 29, 2011 at 11:15 AM, Anand Chitipothu anandol...@gmail.comwrote:

 2011/7/29 Venkatraman S venka...@gmail.com:
  On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu anandol...@gmail.com
 wrote:
 
  2011/7/28 Venkatraman S venka...@gmail.com:
   parsing using minidom is one of the slowest. if you just want to
 extract
  the
   distance and assuming that it(the tag) will always be consistent, then
 i
   would always suggest regexp. xml parsing is a pain.
 
  regexp is a bad solution to parse xml.
 
  minidom is the fastest solution if you consider the programmer time
  instead of developer time.  Minidom is available in standard library,
  you don't have to add another dependency and worry about PyPI
  downtimes and lxml compilations failures.
 
  I don't think there will be significant performance difference between
  regexp and minidom unless you are doing it a million times.
 
 
  Well, i have clearly mentioned my assumptions - i.e, when you treat the
 XML
  as a 'string' and do not want
  to retrieve anything else in a 'structured manner'. I am a speed-maniac
 and
  crave for speed; so if the assumption is valid,
  i can vouch for the fact that regexp would be faster and neater solution.
 I
  have done some speed experiments
  in past on this (results of which i do not have handy), and i found this.
 
  XP asks you implement the best solution with the least effort and i think
 in
  this case regexp is a winner. Thoughts can vary though.

 regexp can at the best be a dirty-hack, not a best solution for xml
 parsing.


read again : i am not actually working on 'xml' (see my assumption?).
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Venkatraman S
On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote:


  Well, i have clearly mentioned my assumptions - i.e, when you treat
  the XML as a 'string' and do not want to retrieve anything else in a
  'structured manner'.


 If the data is structured, it makes sense to exploit that structure and
 use a proper solution.


yes, and xml parsing is a bad way when you are not interested in the xml
structure..


 I find it highly unlikely that for a large program sufferring from low
 performance, replacing an XML parser with a regexp based parser will
 significantly improve peformance.


Sigh! Again, guys, i am referring to regexp when all you need is some number
within a tag!
If the content of that tag was text, i would have never suggested this
solution.

HTH.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Noufal Ibrahim
Venkatraman S venka...@gmail.com writes:

[...]

 Sigh! Again, guys, i am referring to regexp when all you need is some
 number within a tag!  If the content of that tag was text, i would
 have never suggested this solution.
[...]

And I'm telling you that even a slight change to the tag - an extra
space, a newline, a new attribute, a change in case or any such thing
which doesn't modify it's meaning as far as the XML snippet is concerned
will break your regexp and cause your program to crash. 

I'm asking you to consider what you're trading for this assumed increase
in speed. 


-- 
~noufal
http://nibrahim.net.in

The scene is dull. Tell him to put more life into his dying. -- Samuel Goldwyn
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Venkatraman S
On Fri, Jul 29, 2011 at 11:44 AM, Noufal Ibrahim nou...@gmail.com wrote:


 And I'm telling you that even a slight change to the tag - an extra
 space, a newline, a new attribute, a change in case or any such thing
 which doesn't modify it's meaning as far as the XML snippet is concerned
 will break your regexp and cause your program to crash.

 I'm asking you to consider what you're trading for this assumed increase
 in speed.


Along the same lines...the problems are more when xml is concerned, for even
if some other tag is malformed, then the
whole document is 'gone'.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Venkatraman S
On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote:

  I am a speed-maniac and crave for speed; so if the assumption is
  valid, i can vouch for the fact that regexp would be faster and neater
  solution. I have done some speed experiments in past on this (results
  of which i do not have handy), and i found this.

 Premature optimisation is the root of all evil.


I belong to a different school. I think about performance right from the
design dashboards for i think, be it a simple webapp or a financial
application, the choice of your design patterns and techstack goes a long
way in a good customer experience. Bulk of my thoughts are reflected in here
: http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html

I generally lay emphasis on 2 things when it comes to webapps : 1) Layout
and 2)Speed. I am ready to sacrifice certain features; if i think a certain
feature will cause some issues with customer experience(after all we are
developing apps for the customer and if he is made to wait for 5s for an
events, then its bad), i would rather not present it or present in some
'other' fashion(like a batched job?).
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Noufal Ibrahim
Venkatraman S venka...@gmail.com writes:

 On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote:

  I am a speed-maniac and crave for speed; so if the assumption is
  valid, i can vouch for the fact that regexp would be faster and neater
  solution. I have done some speed experiments in past on this (results
  of which i do not have handy), and i found this.

 Premature optimisation is the root of all evil.


 I belong to a different school. I think about performance right from the
 design dashboards for i think, be it a simple webapp or a financial
 application, the choice of your design patterns and techstack goes a long
 way in a good customer experience. Bulk of my thoughts are reflected in here
 :
 http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html

I agree and I try my best to do the same thing. However, I differentiate
between micro optimsations like rewriting parts in C and XML and top
level optimisations like good design and the right data structures. 

The former, I don't do because I get bogged down by the details and end
up delivering something that's super fast *really* late. The latter, I
do because otherwise, the application is unusable and a bad
experience. Also, micro optimising (e.g. replacing DOM parsing with
regexps to extract stuff out of an XML message) makes code more brittle
which is also a no win for the customer.

I end up messing with the former only when I've exhausted all other
avenues and *really* need that last drop of juice. This is usually
common in games and stuff like that with continous involved user
interaction rather than in webapps where it's a little more spaced out. 

If performance is *this* important to you, why don't you code your
entire application in assembly hand crafting it for a certain processor,
amount of memory and hard disk platter speed? Why use Python at all? The
reason is because Python is fast enough for most things. You can get
better performance moving to lower level routines but it's often not
necessary and the costs it entails are usually not worth it. Better a
fast enough stable app than a super fast one that occasionally segfaults
and loses data.

That's the point I'm trying to make.


[...]


-- 
~noufal
http://nibrahim.net.in

If I could drop dead right now, I'd be the happiest man alive! -- Samuel Goldwyn
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Sidu Ponnappa
+1.

On Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote:
 Venkatraman S venka...@gmail.com writes:

 On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com wrote:

  I am a speed-maniac and crave for speed; so if the assumption is
  valid, i can vouch for the fact that regexp would be faster and neater
  solution. I have done some speed experiments in past on this (results
  of which i do not have handy), and i found this.

 Premature optimisation is the root of all evil.


 I belong to a different school. I think about performance right from the
 design dashboards for i think, be it a simple webapp or a financial
 application, the choice of your design patterns and techstack goes a long
 way in a good customer experience. Bulk of my thoughts are reflected in here
 :
 http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html

 I agree and I try my best to do the same thing. However, I differentiate
 between micro optimsations like rewriting parts in C and XML and top
 level optimisations like good design and the right data structures.

 The former, I don't do because I get bogged down by the details and end
 up delivering something that's super fast *really* late. The latter, I
 do because otherwise, the application is unusable and a bad
 experience. Also, micro optimising (e.g. replacing DOM parsing with
 regexps to extract stuff out of an XML message) makes code more brittle
 which is also a no win for the customer.

 I end up messing with the former only when I've exhausted all other
 avenues and *really* need that last drop of juice. This is usually
 common in games and stuff like that with continous involved user
 interaction rather than in webapps where it's a little more spaced out.

 If performance is *this* important to you, why don't you code your
 entire application in assembly hand crafting it for a certain processor,
 amount of memory and hard disk platter speed? Why use Python at all? The
 reason is because Python is fast enough for most things. You can get
 better performance moving to lower level routines but it's often not
 necessary and the costs it entails are usually not worth it. Better a
 fast enough stable app than a super fast one that occasionally segfaults
 and loses data.

 That's the point I'm trying to make.


 [...]


 --
 ~noufal
 http://nibrahim.net.in

 If I could drop dead right now, I'd be the happiest man alive! -- Samuel 
 Goldwyn
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers

___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Umar Shah
+1

On Fri, Jul 29, 2011 at 12:50 PM, Sidu Ponnappa lorddae...@gmail.comwrote:

 +1.

 On Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote:
  Venkatraman S venka...@gmail.com writes:
 
  On Fri, Jul 29, 2011 at 11:31 AM, Noufal Ibrahim nou...@gmail.com
 wrote:
 
   I am a speed-maniac and crave for speed; so if the assumption is
   valid, i can vouch for the fact that regexp would be faster and
 neater
   solution. I have done some speed experiments in past on this (results
   of which i do not have handy), and i found this.
 
  Premature optimisation is the root of all evil.
 
 
  I belong to a different school. I think about performance right from the
  design dashboards for i think, be it a simple webapp or a financial
  application, the choice of your design patterns and techstack goes a
 long
  way in a good customer experience. Bulk of my thoughts are reflected in
 here
  :
  http://www.codinghorror.com/blog/2011/06/performance-is-a-feature.html
 
  I agree and I try my best to do the same thing. However, I differentiate
  between micro optimsations like rewriting parts in C and XML and top
  level optimisations like good design and the right data structures.
 
  The former, I don't do because I get bogged down by the details and end
  up delivering something that's super fast *really* late. The latter, I
  do because otherwise, the application is unusable and a bad
  experience. Also, micro optimising (e.g. replacing DOM parsing with
  regexps to extract stuff out of an XML message) makes code more brittle
  which is also a no win for the customer.
 
  I end up messing with the former only when I've exhausted all other
  avenues and *really* need that last drop of juice. This is usually
  common in games and stuff like that with continous involved user
  interaction rather than in webapps where it's a little more spaced out.
 
  If performance is *this* important to you, why don't you code your
  entire application in assembly hand crafting it for a certain processor,
  amount of memory and hard disk platter speed? Why use Python at all? The
  reason is because Python is fast enough for most things. You can get
  better performance moving to lower level routines but it's often not
  necessary and the costs it entails are usually not worth it. Better a
  fast enough stable app than a super fast one that occasionally segfaults
  and loses data.
 
  That's the point I'm trying to make.
 
 
  [...]
 
 
  --
  ~noufal
  http://nibrahim.net.in
 
  If I could drop dead right now, I'd be the happiest man alive! -- Samuel
 Goldwyn
  ___
  BangPypers mailing list
  BangPypers@python.org
  http://mail.python.org/mailman/listinfo/bangpypers
 
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers

___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Venkatraman S
n Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote:

 I agree and I try my best to do the same thing. However, I differentiate
 between micro optimsations like rewriting parts in C and XML and top
 level optimisations like good design and the right data structures.


Using regexp is micro optimization?


 The former, I don't do because I get bogged down by the details and end
 up delivering something that's super fast *really* late. The latter, I
 do because otherwise, the application is unusable and a bad
 experience. Also, micro optimising (e.g. replacing DOM parsing with
 regexps to extract stuff out of an XML message) makes code more brittle
 which is also a no win for the customer.


IMHO, regexps are much more powerful and fault tolerant than XML parsing.
XMLs are brittle.


 If performance is *this* important to you, why don't you code your
 entire application in assembly hand crafting it for a certain processor,
 amount of memory and hard disk platter speed? Why use Python at all? The
 reason is because Python is fast enough for most things. You can get
 better performance moving to lower level routines but it's often not
 necessary and the costs it entails are usually not worth it. Better a
 fast enough stable app than a super fast one that occasionally segfaults
 and loses data.


Not sure how this point is relevant; the amount of performance you need is
dependant on the nature of application you develop.

For eg, see this presentation for how simple hacks like 'compressing' dom
'getters/setters' can affect browser performance.
http://paulirish.com/2011/dom-html5-css3-performance/
At the same time, look at this TED talk wherein trading applications spend
billions of $ for milliseconds performance gains
http://www.youtube.com/watch?v=TDaFwnOiKVE

For a webapp, XML parsing is very important factor that the developer *must*
consider while designing.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Noufal Ibrahim
Venkatraman S venka...@gmail.com writes:

 n Fri, Jul 29, 2011 at 12:20 PM, Noufal Ibrahim nou...@gmail.com wrote:

 I agree and I try my best to do the same thing. However, I differentiate
 between micro optimsations like rewriting parts in C and XML and top
 level optimisations like good design and the right data structures.


 Using regexp is micro optimization?

For parsing XML, yes it is. 

It's not the first thing I'd use and it's something I'd consider only
after I've exhausted everything else and have reason to believe that my
application is not fast enough just because I'm using an XML parser
instead of a regexp.

There are some places where I would use regexps instead of a parser
upfront though. Mostly related to streams of bad XML data (particularly
while screen scraping) but even then, a fault tolerant parser would do
better than regexps. 


[...]

 IMHO, regexps are much more powerful and fault tolerant than XML parsing.
 XMLs are brittle.

If you say so. I don't have much more to say on this. 

There was an interesting exchange a while ago between ERic Raymond and
John Graham-Cumming on using regexps. vs. a regular parser while
screenscraping to fetch data out of a forge site.  Here's the link. You
might find it interesting

http://blog.jgc.org/2009/11/parsing-html-in-python-with.html#links

 If performance is *this* important to you, why don't you code your
 entire application in assembly hand crafting it for a certain processor,
 amount of memory and hard disk platter speed? Why use Python at all? The
 reason is because Python is fast enough for most things. You can get
 better performance moving to lower level routines but it's often not
 necessary and the costs it entails are usually not worth it. Better a
 fast enough stable app than a super fast one that occasionally segfaults
 and loses data.


 Not sure how this point is relevant; the amount of performance you
 need is dependant on the nature of application you develop.

Yup. And I put it to you that switching from a regular XML parser to a
regexp based one will not give you a sufficient speed boost to justify
the higher maintenance costs in most cases. 


 For a webapp, XML parsing is very important factor that the developer
 *must* consider while designing.

And your advice is to use regexps to do this? 


[...]


-- 
~noufal
http://nibrahim.net.in

A verbal contract isn't worth the paper it's written on. Include me out. 
-Samuel Goldwyn
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Baishampayan Ghose
On Fri, Jul 29, 2011 at 1:09 PM, Venkatraman S venka...@gmail.com wrote:
 IMHO, regexps are much more powerful and fault tolerant than XML parsing.
 XMLs are brittle.

Did you mean parsing XML using Regular Expressions is more powerful
and fault tolerant than using a XML parser?

Regards,
BG

-- 
Baishampayan Ghose
b.ghose at gmail.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Gora Mohanty
On Fri, Jul 29, 2011 at 10:55 AM, Baishampayan Ghose b.gh...@gmail.com wrote:
 minidom is the fastest solution if you consider the programmer time
 instead of developer time.  Minidom is available in standard library,
 you don't have to add another dependency and worry about PyPI
 downtimes and lxml compilations failures.

 FWIW, ElementTree is a part of the standard library as well and is
 known to be much better than minidom in various ways.

Being part of the standard library is a big plus. However ,
as compared to lxml, one thing that seemed to be missing
in more standard Python XML libraries is XSLT. This is
incredibly useful in some contexts. I would love to learn
about XSLT support in other libraries.

Here is the blog post about the performance of Python XML
libraries that I was referring to earlier:
http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

Regards,
Gora
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Venkatraman S
Noufal,

I have nothing more to say than this(as i see some tangential replies which
i am not interested in substantiating - for eg, i never suggested to use a
regexp based parser - a regexp based xml parser is different from using 'a'
regexp on a string!) :

Read my replies properly. Read my assumptions properly w.r.t the xml
structure and the requested value in the xml.  Read the link that you have
pasted again. If possible, read the comments in the link shared(from esr)
again.  Once done, think twice and tell me which is better. If you vouch for
xml parsing in the case when all that you need from the string is a simple
numeric value(not a string), then good luck; unlike esr i will not use
adjectives; but i would not use your code either.

Thanks.

-V
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-29 Thread Noufal Ibrahim
Venkatraman S venka...@gmail.com writes:

[...]

 Read my replies properly. Read my assumptions properly w.r.t the xml
 structure and the requested value in the xml.  Read the link that you
 have pasted again. If possible, read the comments in the link
 shared(from esr) again.  Once done, think twice and tell me which is
 better. If you vouch for xml parsing in the case when all that you
 need from the string is a simple numeric value(not a string), then
 good luck; unlike esr i will not use adjectives; but i would not use
 your code either.

[...]

Fair enough.


-- 
~noufal
http://nibrahim.net.in

Professional certification for car people may sound like an oxymoron. -The 
Wall Street Journal, page B1, Tuesday, July 17, 1990.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Venkatraman S
grep or regexp?

-V
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread hemant
Using xpath such as:

/gpx/extensions/distance(:text)

?



On Thu, Jul 28, 2011 at 3:20 PM, Venkatraman S venka...@gmail.com wrote:
 grep or regexp?

 -V
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org
http://twitter.com/gnufied
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Ramdas S
On Thu, Jul 28, 2011 at 3:20 PM, Venkatraman S venka...@gmail.com wrote:

 grep or regexp?

 -V
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers



can write an Xml parsing query
-- 
Ramdas S
+91 9342 583 065
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Anand Chitipothu
2011/7/28 Kenneth Gonsalves law...@gmail.com:
 hi,

 here is a simplified version of an xml file:

 ?xml version=1.0 encoding=UTF-8?
    gpx 
        metadata
                author
                nameCloudMade/name
                email id=support domain=cloudmade.com /
                link href=http://maps.cloudmade.com;/link
                /author
                copyright author=CloudMade
                licensehttp://cloudmade.com/faq#license/license
                /copyright
                time2011-07-28T07:04:01/time
        /metadata
            extensions
                distance1489/distance
                time344/time
                startSägerstraße/start
                endIm Gisinger Feld/end
            /extensions
    /gpx

 I want to get the value of the distance element - 1489. What is the
 simplest way of doing this?

 from xml.dom import minidom
 dom = minidom.parseString(x)
 dom.getElementsByTagName(distance)[0].childNodes[0].nodeValue
u'1489'

Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Baishampayan Ghose
 here is a simplified version of an xml file:

 ?xml version=1.0 encoding=UTF-8?
    gpx 
        metadata
                author
                nameCloudMade/name
                email id=support domain=cloudmade.com /
                link href=http://maps.cloudmade.com;/link
                /author
                copyright author=CloudMade
                licensehttp://cloudmade.com/faq#license/license
                /copyright
                time2011-07-28T07:04:01/time
        /metadata
            extensions
                distance1489/distance
                time344/time
                startSägerstraße/start
                endIm Gisinger Feld/end
            /extensions
    /gpx

 I want to get the value of the distance element - 1489. What is the
 simplest way of doing this?

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from xml.etree.ElementTree import fromstring

data = ?xml version=1.0 encoding=UTF-8?
   gpx 
   metadata
   author
   nameCloudMade/name
   email id=support domain=cloudmade.com /
   link href=http://maps.cloudmade.com;/link
   /author
   copyright author=CloudMade
   licensehttp://cloudmade.com/faq#license/license
   /copyright
   time2011-07-28T07:04:01/time
   /metadata
   extensions
   distance1489/distance
   time344/time
   startSägerstraße/start
   endIm Gisinger Feld/end
   /extensions
   /gpx


def parse_xml(s):
element = fromstring(s)
return element.find(extensions/distance).text

if __name__ == __main__:
print parse_xml(data)

Hope that helps.

Regards,
BG

-- 
Baishampayan Ghose
b.ghose at gmail.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Kenneth Gonsalves
On Thu, 2011-07-28 at 15:33 +0530, Anand Chitipothu wrote:
  I want to get the value of the distance element - 1489. What is the
  simplest way of doing this?
 
  from xml.dom import minidom
  dom = minidom.parseString(x)
  dom.getElementsByTagName(distance)[0].childNodes[0].nodeValue
 u'1489' 

thanks - perfect.
-- 
regards
Kenneth Gonsalves

___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread kracekumar ramaraju
You can try beautifulsoup, recommended for python/XML Parsing.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Sidu Ponnappa
If you're doing this repeatedly, you may want to just delegate to a
native XPath implementation. I haven't done much Python, so I can't
comment on your choices, but in Ruby I'd simply hand off to libXML
using Nokogiri. This approach should be a whole lot faster, but I'd
advise benchmarking first because, as I said, I know little about
Python.

Best,
Sidu.
http://sidu.in

On Thu, Jul 28, 2011 at 4:01 PM, Kenneth Gonsalves law...@gmail.com wrote:
 On Thu, 2011-07-28 at 15:33 +0530, Anand Chitipothu wrote:
  I want to get the value of the distance element - 1489. What is the
  simplest way of doing this?

  from xml.dom import minidom
  dom = minidom.parseString(x)
  dom.getElementsByTagName(distance)[0].childNodes[0].nodeValue
 u'1489'

 thanks - perfect.
 --
 regards
 Kenneth Gonsalves

 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers

___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Venkatraman S
parsing using minidom is one of the slowest. if you just want to extract the
distance and assuming that it(the tag) will always be consistent, then i
would always suggest regexp. xml parsing is a pain.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Gora Mohanty
On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S venka...@gmail.com wrote:
 parsing using minidom is one of the slowest. if you just want to extract the
 distance and assuming that it(the tag) will always be consistent, then i
 would always suggest regexp. xml parsing is a pain.
[...]

Strongly disagree. IMHO, regexps are the wrong solution
for parsing XML (or, any kind of well-structured text), as
they end up becoming intolerably complex, and do not
degrade gracefully for broken XML.

Have not compared speeds myself, but there are blogs
that go into that. In my experience, the cleanest, most
efficient, and richest-in-features Python XML library is
lxml. For people used to BeautifulSoup, lxml has a
BeautifulSoup parser, and is significantly more efficient.

Regards,
Gora
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Ramdas S
On Fri, Jul 29, 2011 at 1:23 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S venka...@gmail.com
 wrote:
  parsing using minidom is one of the slowest. if you just want to extract
 the
  distance and assuming that it(the tag) will always be consistent, then i
  would always suggest regexp. xml parsing is a pain.
 [...]

 Strongly disagree. IMHO, regexps are the wrong solution
 for parsing XML (or, any kind of well-structured text), as
 they end up becoming intolerably complex, and do not
 degrade gracefully for broken XML.

 Have not compared speeds myself, but there are blogs
 that go into that. In my experience, the cleanest, most
 efficient, and richest-in-features Python XML library is
 lxml. For people used to BeautifulSoup, lxml has a
 BeautifulSoup parser, and is significantly more efficient.


If it's a questions of the fastest gun around it must be cElementTree, and
please refer the table somewhere towards bottom of the page. Caveat, the
page belongs to effbot who is written the package.

http://effbot.org/zone/celementtree.htm

 Regards,
 Gora
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




-- 
Ramdas S
+91 9342 583 065
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Joseph Gladson
Hi,

check and try pyparsing module... U could do it so simple:)

regards,
joseph

On 7/29/11, Ramdas S ram...@gmail.com wrote:
 On Fri, Jul 29, 2011 at 1:23 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Thu, Jul 28, 2011 at 10:37 PM, Venkatraman S venka...@gmail.com
 wrote:
  parsing using minidom is one of the slowest. if you just want to extract
 the
  distance and assuming that it(the tag) will always be consistent, then i
  would always suggest regexp. xml parsing is a pain.
 [...]

 Strongly disagree. IMHO, regexps are the wrong solution
 for parsing XML (or, any kind of well-structured text), as
 they end up becoming intolerably complex, and do not
 degrade gracefully for broken XML.

 Have not compared speeds myself, but there are blogs
 that go into that. In my experience, the cleanest, most
 efficient, and richest-in-features Python XML library is
 lxml. For people used to BeautifulSoup, lxml has a
 BeautifulSoup parser, and is significantly more efficient.


 If it's a questions of the fastest gun around it must be cElementTree, and
 please refer the table somewhere towards bottom of the page. Caveat, the
 page belongs to effbot who is written the package.

 http://effbot.org/zone/celementtree.htm

 Regards,
 Gora
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers




 --
 Ramdas S
 +91 9342 583 065
 ___
 BangPypers mailing list
 BangPypers@python.org
 http://mail.python.org/mailman/listinfo/bangpypers


-- 
Sent from my mobile device
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Anand Chitipothu
2011/7/28 Venkatraman S venka...@gmail.com:
 parsing using minidom is one of the slowest. if you just want to extract the
 distance and assuming that it(the tag) will always be consistent, then i
 would always suggest regexp. xml parsing is a pain.

regexp is a bad solution to parse xml.

minidom is the fastest solution if you consider the programmer time
instead of developer time.  Minidom is available in standard library,
you don't have to add another dependency and worry about PyPI
downtimes and lxml compilations failures.

I don't think there will be significant performance difference between
regexp and minidom unless you are doing it a million times.

Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Baishampayan Ghose
 minidom is the fastest solution if you consider the programmer time
 instead of developer time.  Minidom is available in standard library,
 you don't have to add another dependency and worry about PyPI
 downtimes and lxml compilations failures.

FWIW, ElementTree is a part of the standard library as well and is
known to be much better than minidom in various ways.

Regards,
BG

-- 
Baishampayan Ghose
b.ghose at gmail.com
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Venkatraman S
On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu anandol...@gmail.comwrote:

 2011/7/28 Venkatraman S venka...@gmail.com:
  parsing using minidom is one of the slowest. if you just want to extract
 the
  distance and assuming that it(the tag) will always be consistent, then i
  would always suggest regexp. xml parsing is a pain.

 regexp is a bad solution to parse xml.

 minidom is the fastest solution if you consider the programmer time
 instead of developer time.  Minidom is available in standard library,
 you don't have to add another dependency and worry about PyPI
 downtimes and lxml compilations failures.

 I don't think there will be significant performance difference between
 regexp and minidom unless you are doing it a million times.


Well, i have clearly mentioned my assumptions - i.e, when you treat the XML
as a 'string' and do not want
to retrieve anything else in a 'structured manner'. I am a speed-maniac and
crave for speed; so if the assumption is valid,
i can vouch for the fact that regexp would be faster and neater solution. I
have done some speed experiments
in past on this (results of which i do not have handy), and i found this.

XP asks you implement the best solution with the least effort and i think in
this case regexp is a winner. Thoughts can vary though.
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Anand Chitipothu
2011/7/29 Venkatraman S venka...@gmail.com:
 On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu 
 anandol...@gmail.comwrote:

 2011/7/28 Venkatraman S venka...@gmail.com:
  parsing using minidom is one of the slowest. if you just want to extract
 the
  distance and assuming that it(the tag) will always be consistent, then i
  would always suggest regexp. xml parsing is a pain.

 regexp is a bad solution to parse xml.

 minidom is the fastest solution if you consider the programmer time
 instead of developer time.  Minidom is available in standard library,
 you don't have to add another dependency and worry about PyPI
 downtimes and lxml compilations failures.

 I don't think there will be significant performance difference between
 regexp and minidom unless you are doing it a million times.


 Well, i have clearly mentioned my assumptions - i.e, when you treat the XML
 as a 'string' and do not want
 to retrieve anything else in a 'structured manner'. I am a speed-maniac and
 crave for speed; so if the assumption is valid,
 i can vouch for the fact that regexp would be faster and neater solution. I
 have done some speed experiments
 in past on this (results of which i do not have handy), and i found this.

 XP asks you implement the best solution with the least effort and i think in
 this case regexp is a winner. Thoughts can vary though.

regexp can at the best be a dirty-hack, not a best solution for xml parsing.

Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers


Re: [BangPypers] parsing xml

2011-07-28 Thread Anand Chitipothu
2011/7/29 Baishampayan Ghose b.gh...@gmail.com:
 minidom is the fastest solution if you consider the programmer time
 instead of developer time.  Minidom is available in standard library,
 you don't have to add another dependency and worry about PyPI
 downtimes and lxml compilations failures.

 FWIW, ElementTree is a part of the standard library as well and is
 known to be much better than minidom in various ways.

New in version 2.5.

I don't do much xml parsing and I've always been using minidom when I
needed it. I think I should update my knowledge.

Anand
___
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers