On 2/7/2021 10:22 AM, Arrigo Marchiori wrote:
> Hello all,
> 
> re-replying to Jim's message.
> 
> On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:
> 
>> Funny that you bring this up... I'm been tracking down some bugs and they
>> all seem to be XML related... fastsax->libwriterfilter with occasional cores
>> due to __cxa_call_unexpected.
>>
>> I feel that making AOO more fragile by trying to work around cases where
>> invalid and/or non-compliant XML is encountered is just wrong. We should
>> either ignore the error (catch it) or raise an exception. Invalid data 
>> shouldn't
>> be tolerated. Additionally, trying to be "lenient" is an easy vector for
>> vulnerabilities.
> 
> For the record: the detection of duplicated attributes is made
> internally by the expat library. Our code just receives the error
> message and cannot do anything to recover it.
> 
> I don't believe it's worth patching expat to allow duplicated
> attributes. I don't know the library well and I fear about the
> consequences of tinkering with it.
> 
> But then my question becomes: do we want to offer any data recovery
> tools for corrupted documents? Like ``dumb'' XML parsers that just
> shave away XML errors?
> 
>  1- it could be an external tool, written in a language that is easier
>     to code into? (like Python, Perl, Java... whatever)
> 
>  2- or an internal pre-parsing phase? It should not be based on the
>     expat library though; do we have any other possibilities among the
>     current modules?
> 
>  3- or we leave it to hand-crafting by knowledgeable people on the
>     forum, as it is happening now?
> 
> I am looking forward to opinions ... and possibily reviews of PR 122
> please ;-)
> 
> Best regards,
> 
Purely from a users point of view I agree with Jim. It should not be
allowed to happen. Asking the user to run an external program, our to
send it to the forum to be hand edit is a recipe for disaster to our
user base and from a marketing standpoint.

I could see an external program as a  short term, stop gap work around.
However it should only be that.

Regards
Keith


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to