Hello,

Some columns in a DB have badly formed HTML, to the point BeautifulSoup (lxml?) 
fails:

=============
#Some records start with 0A</crap>
soup = BeautifulSoup("\n</strong>", 'lxml')
#AttributeError: 'NoneType' object has no attribute 'text'
print(soup.body.text)
=============

What would be a nice way to solve the problem?

Is there a command to remove wrong tags altogether (eg. strings that starts 
with </strong>), or should I just catch the error?

Thank you.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to