Hello all,

re-replying to Jim's message.

On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:

> Funny that you bring this up... I'm been tracking down some bugs and they
> all seem to be XML related... fastsax->libwriterfilter with occasional cores
> due to __cxa_call_unexpected.
> 
> I feel that making AOO more fragile by trying to work around cases where
> invalid and/or non-compliant XML is encountered is just wrong. We should
> either ignore the error (catch it) or raise an exception. Invalid data 
> shouldn't
> be tolerated. Additionally, trying to be "lenient" is an easy vector for
> vulnerabilities.

For the record: the detection of duplicated attributes is made
internally by the expat library. Our code just receives the error
message and cannot do anything to recover it.

I don't believe it's worth patching expat to allow duplicated
attributes. I don't know the library well and I fear about the
consequences of tinkering with it.

But then my question becomes: do we want to offer any data recovery
tools for corrupted documents? Like ``dumb'' XML parsers that just
shave away XML errors?

 1- it could be an external tool, written in a language that is easier
    to code into? (like Python, Perl, Java... whatever)

 2- or an internal pre-parsing phase? It should not be based on the
    expat library though; do we have any other possibilities among the
    current modules?

 3- or we leave it to hand-crafting by knowledgeable people on the
    forum, as it is happening now?

I am looking forward to opinions ... and possibily reviews of PR 122
please ;-)

Best regards,
-- 
Arrigo

http://rigo.altervista.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to