Answer to self: use parser in incremental mode, traverse the read buffer and chop it into slices delimited by but not including the CR's.
Best, /PA On Sat, 14 May 2022 at 08:06, Pedro Andres Aranda Gutierrez < paag...@gmail.com> wrote: > OK, just for reference, attached is my MWE . Get the ZIP file from > gutenberg.org with > > wget https://www.gutenberg.org/files/68047/68047-h.zip > > lxml version 4.8, python 3.9 on Ubuntu 20.04 or macOS BigSur > > Those are really annoying.... > > Best, /PA > > On Fri, 13 May 2022 at 12:47, Gilles <codecompl...@free.fr> wrote: > >> On 12/05/2022 22:32, Adrian Bool wrote: >> >> On 12 May 2022, at 10:26, Gilles <codecompl...@free.fr> wrote: >> >> File "src\lxml\parser.pxi", line 652, in lxml.etree._raiseParseError >> OSError: Error reading file* '<html>* >> >> >> Look at the last line above - you're giving parse() a string containing >> XML data which the parse() function is treating as a filename; trying to >> open a file with a name equivalent to your XML content! >> >> If you want to parse an XML string - use et.fromstring() instead. >> >> The StringIO call may be reasonable if your XML didn't exist on disk; but >> if your source data is on disk best to either give parse() the filename >> (but then you get your #13 issue) or pass it a file handle provided by >> open(). >> >> Sorry I overlooked the last line. I dumbly supposed that parse() could >> take either a file handle or a string. >> _______________________________________________ >> lxml - The Python XML Toolkit mailing list -- lxml@python.org >> To unsubscribe send an email to lxml-le...@python.org >> https://mail.python.org/mailman3/lists/lxml.python.org/ >> Member address: paag...@gmail.com >> > > > -- > Fragen sind nicht da um beantwortet zu werden, > Fragen sind da um gestellt zu werden > Georg Kreisler > > Headaches with a Juju log: > unit-basic-16: 09:17:36 WARNING juju.worker.uniter.operation we should run > a leader-deposed hook here, but we can't yet > > -- Fragen sind nicht da um beantwortet zu werden, Fragen sind da um gestellt zu werden Georg Kreisler Headaches with a Juju log: unit-basic-16: 09:17:36 WARNING juju.worker.uniter.operation we should run a leader-deposed hook here, but we can't yet
_______________________________________________ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com