Re: [ditac] [ebookc] Ebookc: Index creation issue for Scandinavian languages

Leif Halvard Silli Fri, 22 Sep 2017 09:24:05 -0700

On 22 Sep 2017, at 9:31, Hussein Shafie wrote:

On 09/21/2017 09:44 PM, Leif Halvard Silli wrote:
I attach the files 'nb.ebook' and 'nbPage.html'. The book has anIndexpage with some words, including the word 'Ångström'. As you cansee,when you convert to one page HTML, the Ångström gets placed at theendof the Index, under the heading 'A' - which should have been theheading
'Å'.
Thanks. It will take some time to fix these bugs (1. low-prioritybugs; 2. we must find out what to really do by learning more aboutback-of-the-book indexes).

Perhaps Java implement “[Unicode CollationAlgorithm](http://www.unicode.org/reports/tr10/): «Collation is thegeneral term for the process and function of determining the sortingorder of strings of characters.»?

The document deals with the issue: “a few languages, such as Norwegianand Danish, sort ø as a unique element after z“.

One reason I assume that Java deals with this is that for the sorting oftable rows, which is a feature of the various XMLmind XML editors, thesort order referred to as “Dictionary” depends on the language ofparent element: If the element is some variant of Norwegian (nb/nn/no),the sorting follows Norwegian collation rules - with words beginnong onthe letters Æ, Ø, Å sorted separately at the end of the list(including that the letter 'Ö' is treated as a variant of 'Ø', whichis yet another thing that Norwegians expect). Whereas if the language ise.g. German (de), the sorting follows German sortorder. Unless you havehardcode this, then it is a feature of the underlying programminglanguage, I suppose ...

What I do not understand is the sortorder which XMLmind refers to asLexiographic: First i appears to be unaffected by language. Seccond itappears to place words beginning on non-ASCII letters /after/ wordsbeginning on Z - sofar so good. However the interanl sort order of thewords that occurs after the Z is not Scandinavian, but appears to bebased on the A-Z sort order, where wordes on A-like letters come first -namely e.g. Ångström before e.g. Öland.

When I included words beginning on Æ, Ø, Ö and Å in my index (thatis: all the special Norwegian letters plus the Swedish Ö, which weexpect to be handled as a Ø), I undestood that the Index that XMLmind -or rather ebookc - generates, follows the Norwegian collation order (aslong as the ebook is declared to be of a variant of Norwegian), exceptthat the Swedish Ö is given its own heading, after the Ø.

Please note that ditac -- XMLmind DITA Converter,http://www.xmlmind.com/ditac/ -- has exactly the same problems(similar code), so please do not bother reporting these bugs for ditactoo.

Ok.

Btw: I expected to see the 'ToC line' for the index, but I don’tsee it
...
This is normal. The Index 'ToC line' is not generated when ebookcfinds an index heading which does not start with letter A to Z.
In your case, you have an index heading actually starting with 'Å',even if this letter is displayed as an 'A'.

OK. So the line is not supposed to be generated when the Index containswords that begins with a non-ASCII letter (except when the sortingorder/collation rule requires the non-ASCII letters to be sorted underASCII letters ...) It would be nice if this could be improved as well.At least in the form of ”being able to specify one’s own “TOCline”.

--
leif halvard silli

--
XMLmind DITA Converter Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/ditac-support

Re: [ditac] [ebookc] Ebookc: Index creation issue for Scandinavian languages

Reply via email to