Hello,

On 07.02.21 16:22, Arrigo Marchiori wrote:
Hello all,

re-replying to Jim's message.

On Wed, Feb 03, 2021 at 02:25:16PM -0500, Jim Jagielski wrote:

Funny that you bring this up... I'm been tracking down some bugs and they
all seem to be XML related... fastsax->libwriterfilter with occasional cores
due to __cxa_call_unexpected.

I feel that making AOO more fragile by trying to work around cases where
invalid and/or non-compliant XML is encountered is just wrong. We should
either ignore the error (catch it) or raise an exception. Invalid data shouldn't
be tolerated. Additionally, trying to be "lenient" is an easy vector for
vulnerabilities.
For the record: the detection of duplicated attributes is made
internally by the expat library. Our code just receives the error
message and cannot do anything to recover it.

I think it is not an issue of expat itself. It is an issue of how expat is setup.

From the pure xml lore you can allow multiple elements of the same name.

consider unordered HTML List, as a reference.

I would opt for checking if we could allow that this Element can be read as a duplicate.

The user can then delete the entry he does not like, and fix therefore the document.

We could provide a helper to help the user to figure out what has happened maybe.


I don't believe it's worth patching expat to allow duplicated
attributes. I don't know the library well and I fear about the
consequences of tinkering with it.

But then my question becomes: do we want to offer any data recovery
tools for corrupted documents? Like ``dumb'' XML parsers that just
shave away XML errors?

  1- it could be an external tool, written in a language that is easier
     to code into? (like Python, Perl, Java... whatever)

  2- or an internal pre-parsing phase? It should not be based on the
     expat library though; do we have any other possibilities among the
     current modules?

  3- or we leave it to hand-crafting by knowledgeable people on the
     forum, as it is happening now?

I am looking forward to opinions ... and possibily reviews of PR 122
please ;-)
I plan to have a look!

Best regards,
--
This is the Way! http://www.apache.org/theapacheway/index.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to