Answer to self: use parser in incremental mode, traverse the read buffer
and chop it into slices delimited by but not including the CR's.

Best, /PA

On Sat, 14 May 2022 at 08:06, Pedro Andres Aranda Gutierrez <
paag...@gmail.com> wrote:

> OK, just for reference, attached is my MWE . Get the ZIP file from
> gutenberg.org with
>
> wget https://www.gutenberg.org/files/68047/68047-h.zip
>
> lxml version 4.8, python 3.9 on Ubuntu 20.04 or macOS BigSur
>
> Those &#13; are really annoying....
>
> Best, /PA
>
> On Fri, 13 May 2022 at 12:47, Gilles <codecompl...@free.fr> wrote:
>
>> On 12/05/2022 22:32, Adrian Bool wrote:
>>
>> On 12 May 2022, at 10:26, Gilles <codecompl...@free.fr> wrote:
>>
>>   File "src\lxml\parser.pxi", line 652, in lxml.etree._raiseParseError
>> OSError: Error reading file* '<html>*
>>
>>
>> Look at the last line above - you're giving parse() a string containing
>> XML data which the parse() function is treating as a filename; trying to
>> open a file with a name equivalent to your XML content!
>>
>> If you want to parse an XML string - use et.fromstring() instead.
>>
>> The StringIO call may be reasonable if your XML didn't exist on disk; but
>> if your source data is on disk best to either give parse() the filename
>> (but then you get your #13 issue) or pass it a file handle provided by
>> open().
>>
>> Sorry I overlooked the last line. I dumbly supposed that parse() could
>> take either a file handle or a string.
>> _______________________________________________
>> lxml - The Python XML Toolkit mailing list -- lxml@python.org
>> To unsubscribe send an email to lxml-le...@python.org
>> https://mail.python.org/mailman3/lists/lxml.python.org/
>> Member address: paag...@gmail.com
>>
>
>
> --
> Fragen sind nicht da um beantwortet zu werden,
> Fragen sind da um gestellt zu werden
> Georg Kreisler
>
> Headaches with a Juju log:
> unit-basic-16: 09:17:36 WARNING juju.worker.uniter.operation we should run
> a leader-deposed hook here, but we can't yet
>
>

-- 
Fragen sind nicht da um beantwortet zu werden,
Fragen sind da um gestellt zu werden
Georg Kreisler

Headaches with a Juju log:
unit-basic-16: 09:17:36 WARNING juju.worker.uniter.operation we should run
a leader-deposed hook here, but we can't yet
_______________________________________________
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com

Reply via email to