Public bug reported: It is not possible to specify an https URI when calling "parse" from the lxml.html module. It will always throw an IOError. Specifying http URIs works.
>>> parse("http://www.google.de") <lxml.etree._ElementTree object at 0x7fa204f2eea8> >>> parse("https://www.google.de") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/dist-packages/lxml/html/__init__.py", line 661, in parse return etree.parse(filename_or_url, parser, base_url=base_url, **kw) File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64493) IOError: Error reading file 'https://www.google.de': failed to load external entity "https://www.google.de" ProblemType: Bug DistroRelease: Ubuntu 10.10 Package: python-lxml 2.2.6-1 ProcVersionSignature: Ubuntu 2.6.35-6.8-generic 2.6.35-rc3 Uname: Linux 2.6.35-6-generic x86_64 NonfreeKernelModules: nvidia Architecture: amd64 Date: Mon Jun 28 22:10:27 2010 InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Alpha amd64 (20100602.2) ProcEnviron: PATH=(custom, user) LANG=en_US.utf8 SHELL=/bin/bash SourcePackage: lxml ** Affects: lxml (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug maverick -- lxml.html.parse does not recognize "https" https://bugs.launchpad.net/bugs/599533 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs