On 22 Sep 2017, at 9:31, Hussein Shafie wrote:
On 09/21/2017 09:44 PM, Leif Halvard Silli wrote:
I attach the files 'nb.ebook' and 'nbPage.html'. The book has an
Index
page with some words, including the word 'Ångström'. As you can
see,
when you convert to one page HTML, the Ångström gets placed at the
end
of the Index, under the heading 'A' - which should have been the
heading
'Å'.
Thanks. It will take some time to fix these bugs (1. low-priority
bugs; 2. we must find out what to really do by learning more about
back-of-the-book indexes).
Perhaps Java implement “[Unicode Collation
Algorithm](http://www.unicode.org/reports/tr10/): «Collation is the
general term for the process and function of determining the sorting
order of strings of characters.»?
The document deals with the issue: “a few languages, such as Norwegian
and Danish, sort ø as a unique element after z“.
One reason I assume that Java deals with this is that for the sorting of
table rows, which is a feature of the various XMLmind XML editors, the
sort order referred to as “Dictionary” depends on the language of
parent element: If the element is some variant of Norwegian (nb/nn/no),
the sorting follows Norwegian collation rules - with words beginnong on
the letters Æ, Ø, Å sorted separately at the end of the list
(including that the letter 'Ö' is treated as a variant of 'Ø', which
is yet another thing that Norwegians expect). Whereas if the language is
e.g. German (de), the sorting follows German sortorder. Unless you have
hardcode this, then it is a feature of the underlying programming
language, I suppose ...
What I do not understand is the sortorder which XMLmind refers to as
Lexiographic: First i appears to be unaffected by language. Seccond it
appears to place words beginning on non-ASCII letters /after/ words
beginning on Z - sofar so good. However the interanl sort order of the
words that occurs after the Z is not Scandinavian, but appears to be
based on the A-Z sort order, where wordes on A-like letters come first -
namely e.g. Ångström before e.g. Öland.
When I included words beginning on Æ, Ø, Ö and Å in my index (that
is: all the special Norwegian letters plus the Swedish Ö, which we
expect to be handled as a Ø), I undestood that the Index that XMLmind -
or rather ebookc - generates, follows the Norwegian collation order (as
long as the ebook is declared to be of a variant of Norwegian), except
that the Swedish Ö is given its own heading, after the Ø.
Please note that ditac -- XMLmind DITA Converter,
http://www.xmlmind.com/ditac/ -- has exactly the same problems
(similar code), so please do not bother reporting these bugs for ditac
too.
Ok.
Btw: I expected to see the 'ToC line' for the index, but I don’t
see it
...
This is normal. The Index 'ToC line' is not generated when ebookc
finds an index heading which does not start with letter A to Z.
In your case, you have an index heading actually starting with 'Å',
even if this letter is displayed as an 'A'.
OK. So the line is not supposed to be generated when the Index contains
words that begins with a non-ASCII letter (except when the sorting
order/collation rule requires the non-ASCII letters to be sorted under
ASCII letters ...) It would be nice if this could be improved as well.
At least in the form of ”being able to specify one’s own “TOC
line”.
--
leif halvard silli
--
XMLmind DITA Converter Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/ditac-support