On 22 Sep 2017, at 9:31, Hussein Shafie wrote:

On 09/21/2017 09:44 PM, Leif Halvard Silli wrote:

I attach the files 'nb.ebook' and 'nbPage.html'. The book has an Index page with some words, including the word 'Ångström'. As you can see, when you convert to one page HTML, the Ångström gets placed at the end of the Index, under the heading 'A' - which should have been the heading
'Å'.

Thanks. It will take some time to fix these bugs (1. low-priority bugs; 2. we must find out what to really do by learning more about back-of-the-book indexes).

Perhaps Java implement “[Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/): «Collation is the general term for the process and function of determining the sorting order of strings of characters.»?

The document deals with the issue: “a few languages, such as Norwegian and Danish, sort ø as a unique element after z“.

One reason I assume that Java deals with this is that for the sorting of table rows, which is a feature of the various XMLmind XML editors, the sort order referred to as “Dictionary” depends on the language of parent element: If the element is some variant of Norwegian (nb/nn/no), the sorting follows Norwegian collation rules - with words beginnong on the letters Æ, Ø, Å sorted separately at the end of the list (including that the letter 'Ö' is treated as a variant of 'Ø', which is yet another thing that Norwegians expect). Whereas if the language is e.g. German (de), the sorting follows German sortorder. Unless you have hardcode this, then it is a feature of the underlying programming language, I suppose ...

What I do not understand is the sortorder which XMLmind refers to as Lexiographic: First i appears to be unaffected by language. Seccond it appears to place words beginning on non-ASCII letters /after/ words beginning on Z - sofar so good. However the interanl sort order of the words that occurs after the Z is not Scandinavian, but appears to be based on the A-Z sort order, where wordes on A-like letters come first - namely e.g. Ångström before e.g. Öland.

When I included words beginning on Æ, Ø, Ö and Å in my index (that is: all the special Norwegian letters plus the Swedish Ö, which we expect to be handled as a Ø), I undestood that the Index that XMLmind - or rather ebookc - generates, follows the Norwegian collation order (as long as the ebook is declared to be of a variant of Norwegian), except that the Swedish Ö is given its own heading, after the Ø.

Please note that ditac -- XMLmind DITA Converter, http://www.xmlmind.com/ditac/ -- has exactly the same problems (similar code), so please do not bother reporting these bugs for ditac too.

Ok.

Btw: I expected to see the 'ToC line' for the index, but I don’t see it
...

This is normal. The Index 'ToC line' is not generated when ebookc finds an index heading which does not start with letter A to Z.

In your case, you have an index heading actually starting with 'Å', even if this letter is displayed as an 'A'.

OK. So the line is not supposed to be generated when the Index contains words that begins with a non-ASCII letter (except when the sorting order/collation rule requires the non-ASCII letters to be sorted under ASCII letters ...) It would be nice if this could be improved as well. At least in the form of ”being able to specify one’s own “TOC line”.
--
leif halvard silli
--
XMLmind DITA Converter Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/ditac-support

Reply via email to