[lxml] Re: [newbie] Different ways to find elements

Adrian Bool Sat, 04 Sep 2021 01:48:25 -0700

Hi,

My understanding is...


xpath()

The xpath function allows for complex queries based upon the XPATH standard.  
For example, from the following XML:

<data>
  <people>
    <person>
      <name>Fred</name>
      <age>34</age>
    </person>
    <person>
      <name>Chris</name>
      <description>22</description>
    </person>
    <person>
      <name>Sarah</name>
      <description>28</description>
    </person>
  </people>
</data>

The following xpath query would pick out just the person object representing 
Chris:

root.xpath("./people/person/name[text()='Chris']/..")

i.e. the output would be a single element list containing:

<person>
    <name>Chris</name>
    <description>22</description>
</person>


find()/findall()/findtext()

The find family of functions do not support advanced XPATH queries such as the 
above; but their simpler searches are faster than xpath.  So, if you don't 
using the complex search facilities of XPATH, best to keep with find for 
performance reasons.

iterfind()

The xpath() and find() functions gather all the results into a list (typically) 
and return them in one go.  If many results are returned this could use a large 
amount of memory in your program.

The iterfind() allows the calling function to loop (iterate) over individual 
elements at a time; removing the need for all results to be stored in memory at 
once.

iter()

It seems to return elements according to matching list of tags.  With 
iterfind() you could constrain where in your tree you want results to be 
returned from; in contrast with iter() you can't do that and all you can say is 
what type (tag) of element you want back.

I've included a test script I used to sanity check the above; in particular 
look at the difference between the iterfind() output - which can just list the 
'guys' person objects —and iter() which lists the person objects under both 
'guys' and 'girls'.

Hope this makes sense!

Cheers,

aid






#!/usr/bin/env python3

from textwrap import indent
from lxml import etree

XML = \
"""<?xml version="1.0"?>
<data>
  <guys>
    <person>
      <name>Fred</name>
      <age>34</age>
    </person>
    <person>
      <name>Chris</name>
      <description>22</description>
    </person>
    <person>
      <name>John</name>
      <description>28</description>
    </person>
  </guys>
  <girls>
    <person>
      <name>Jane</name>
      <description>28</description>
    </person>
    <person>
      <name>Sarah</name>
      <description>18</description>
    </person>
    <person>
      <name>Joanne</name>
      <description>32</description>
    </person>
  </girls>
</data>
"""

root = etree.fromstring(XML)

def display(data):
    if data is None:
        print(indent("NO RESULTS\n", prefix='  '))
    elif isinstance(data, list):
        if len(data) == 0:
            print(indent("NO RESULTS\n", prefix='  '))
        else:
            for n, d in enumerate(data, 1):
                print(indent(f"{n}:", prefix='  '))
                print(indent(etree.tostring(d).decode('utf8'), prefix='  '))
    elif isinstance(data, etree._Element):
       print(indent(etree.tostring(data).decode('utf8'), prefix=' '))
    else:
        raise Exception("Unexpected data type")


#
#
################# Simple Queries
 
 
# Simple query with xpath()
try:
    print("\nSimple query with xpath():")
    result = root.xpath("./guys/person")
    display(result)
except SyntaxError:
    print("\tSyntax error for simple query with xpath()\n")

  
# Simple query with findall()
try:   
    print("\nSimple query with findall():")
    result = root.findall("./guys/person")
    display(result)
except SyntaxError:
    print("\tSyntax error for simple findall()\n")

# Simple query with iterfind
try:
    print("\nSimple query with iterfind:")
    for result in root.iterfind("./guys/person"):
        display(result)
except SyntaxError:
    print("\tSyntax error for simple query with iterfind()\n")

# Simple query with iter()
try:
    print("\nSimple query with iter():")
    for result in root.iter("./guys/person"):
        display(result)
except SyntaxError:
    print("\tSyntax error for simple query with iter()\n")


#
#
################## Tag Selection Only

# Tag selection with iter()
try:
    print("\nTag selection with iter():")
    for result in root.iter("person"):
        display(result)
except SyntaxError:
    print("\tSyntax error for tag selection with iter()\n")


#
#
################# Complex (XPATH) Queries

# Complex query with xpath()
try:    
    print("\nComplex query with xpath():")
    result = root.xpath("./guys/person/name[text()='Chris']/..")
    display(result)
except SyntaxError:
    print("\tSyntax error for complex xpath()\n")


# Complex query with findall()
try:    
    print("\nComplex query with findall():")
    result = root.findall("./guys/person/name[text()='Chris']/..")
    display(result)
except SyntaxError:
    print("\tSyntax error for complex findall()\n")


# Complex query with iterfind(
try:    
    print("\nComplex query with iterfind():")
    for result in root.iterfind("./guys/person/name[text()='Chris']/.."):
        display(result)
except SyntaxError:
    print("\tSyntax error for complex iterfind()\n")








> On 4 Sep 2021, at 01:28, codecompl...@free.fr wrote:
> 
> Hello,
> 
> While still learning about lxml and xpath, I'm not clear as to why there are 
> different ways to find elements in a tree:
> 
> =============
> name = root.xpath('//name')
> print("xpath/name is ",name[0].text)
> 
> name=root.findall('.//name')
> print("findall/name is ",name[0].text)
> 
> for name in root.iter('name'):
>       print("iter/name is ",name.text)
> =============
> 
> Why all those different ways?
> 
> Thank you.
> _______________________________________________
> lxml - The Python XML Toolkit mailing list -- lxml@python.org
> To unsubscribe send an email to lxml-le...@python.org
> https://mail.python.org/mailman3/lists/lxml.python.org/
> Member address: a...@logic.org.uk

_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

[lxml] Re: [newbie] Different ways to find elements

Reply via email to