Public bug reported:

It is not possible to specify an https URI when calling "parse" from the
lxml.html module. It will always throw an IOError. Specifying http URIs
works.

>>> parse("http://www.google.de";)
<lxml.etree._ElementTree object at 0x7fa204f2eea8>
>>> parse("https://www.google.de";)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/dist-packages/lxml/html/__init__.py", line 661, in 
parse
    return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
  File "lxml.etree.pyx", line 2706, in lxml.etree.parse 
(src/lxml/lxml.etree.c:49958)
  File "parser.pxi", line 1500, in lxml.etree._parseDocument 
(src/lxml/lxml.etree.c:71797)
  File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL 
(src/lxml/lxml.etree.c:72080)
  File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile 
(src/lxml/lxml.etree.c:71175)
  File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile 
(src/lxml/lxml.etree.c:68173)
  File "parser.pxi", line 539, in 
lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult 
(src/lxml/lxml.etree.c:65178)
  File "parser.pxi", line 563, in lxml.etree._raiseParseError 
(src/lxml/lxml.etree.c:64493)
IOError: Error reading file 'https://www.google.de': failed to load external 
entity "https://www.google.de";

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: python-lxml 2.2.6-1
ProcVersionSignature: Ubuntu 2.6.35-6.8-generic 2.6.35-rc3
Uname: Linux 2.6.35-6-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Mon Jun 28 22:10:27 2010
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Alpha amd64 (20100602.2)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: lxml

** Affects: lxml (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug maverick

-- 
lxml.html.parse does not recognize "https"
https://bugs.launchpad.net/bugs/599533
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to