Hi,

Yavor Doganov wrote:

GSHTML assumes Latin 1 if not specified.
That's a poor default these days.

true, but I fear it comes from libxml2 itself and/or old HTML specification which is anyway not directly the primary purpose of libxml2!

The issue is that syntax-wise help files are html-like (e.g. <br> tag) but contain many extra tags for structure that HTML does not have and is not intended to be. So technically XML more appropriate, with the extension of some HTML convenience.

Testing different parsers beyond bug search is also useful for an evaluation of how one or the other standards work and maybe cleanup the format by changing it.


tests:
Removing previously all the <meta charset="utf-8" /> in all the .xlp
files.
Worth noting that if these are retained HelpViewer from SVN trunk
displays nothing.

it works for me? I also added a specific test file which contains subsections with different encodings.

All files give you issues? with which parser(s)?

So No Parser set?
The default is Internal if it's not set.

Actually there was some in-congruence. Richard modified it to be GSHTML to maintain the existing behaviour (internal parser was commented out). Then I accidentally commited a change in one place, but not the other class so I don't know how the preference is set. I just changed now to be always Internal, which would be my goal. Other parsers are for evaluation until a decision is made.


Accented characters well interpreted.
Some unexpected double quotes after some titles.
Not only quotes but opening tags as well, plus missing text, like:

Tags Disponibles
# The entire paragpraph after the title is not displayed, just this:
'<b> :

<b>votre texte en

Hmm... does this happen inside legends/boxes ? We have a bug under investigation there that causes string truncation, just a one or a couple of characters though.

Pure XML does not handle HTML entities, so it either needs to be done in UTF-8 or not used.

Riccardo

Reply via email to