Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-20 Thread Eli Zaretskii
Date: Fri, 20 Feb 2015 11:50:17 +0900 From: Martin J. Dürst due...@it.aoyama.ac.jp CC: jcb+unic...@inf.ed.ac.uk, unicode@unicode.org Well, for cased scripts, search is usually case-insensitive, but case conversions aren't given by compatibility decompositions. That's true, but comparing

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-20 Thread Eli Zaretskii
Date: Thu, 19 Feb 2015 22:02:57 + From: Richard Wordingham richard.wording...@ntlworld.com First, collation data is overkill for search, since the order information is not required, so the weights are simply wasting storage. The big waste is not in text-dependent storage, but in

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-20 Thread Eli Zaretskii
From: Philippe Verdy verd...@wanadoo.fr Date: Fri, 20 Feb 2015 04:47:52 +0100 Cc: jcb+unic...@inf.ed.ac.uk, unicode Unicode Discussion unicode@unicode.org Sorry, I disagree. First, collation data is overkill for search, since the order information is not required, so the weights are

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-20 Thread Richard Wordingham
On Fri, 20 Feb 2015 10:04:32 +0200 Eli Zaretskii e...@gnu.org wrote: Date: Thu, 19 Feb 2015 22:02:57 + From: Richard Wordingham richard.wording...@ntlworld.com First, collation data is overkill for search, since the order information is not required, so the weights are simply

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-20 Thread Eli Zaretskii
Date: Fri, 20 Feb 2015 15:01:34 + From: Richard Wordingham richard.wording...@ntlworld.com Sorry, I don't think I follow: what is processing for search orders to which you allude here? The examples in the CLDR root locale and in DUCET are the massive sets of 'contractions' of

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-20 Thread Markus Scherer
On Thu, Feb 19, 2015 at 11:51 PM, Eli Zaretskii e...@gnu.org wrote: I think decomposition to NFKD solves these issues, doesn't it? Not completely. Judging from your question, you expected more mappings than NFKD has. You might want to try the mappings that are used as input for deriving the

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Eli Zaretskii
From: Philippe Verdy verd...@wanadoo.fr Date: Thu, 19 Feb 2015 20:31:07 +0100 Cc: Julian Bradfield jcb+unic...@inf.ed.ac.uk, unicode Unicode Discussion unicode@unicode.org The decompositions are not needed for plain text searches, that can use the collation data (with the collation

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Philippe Verdy
The decompositions are not needed for plain text searches, that can use the collation data (with the collation data, you can unify at the primary level differences such as capitalisation and ignore diacritics, or transform some base groups of letters into a single entry, or make some significant

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Markus Scherer
On Thu, Feb 19, 2015 at 12:17 PM, Eli Zaretskii e...@gnu.org wrote: Sorry, I disagree. First, collation data is overkill for search, since the order information is not required, so the weights are simply wasting storage. Second, people do want to find, e.g., ² when they search for 2 etc.

Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Eli Zaretskii
Does anyone know why does the UCD define compatibility decompositions for Arabic initial, medial, and final forms, but doesn't do the same for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM? Or for that matter, for U+03C2 GREEK SMALL LETTER FINAL SIGMA? The relevant application where

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Michael Everson
On 19 Feb 2015, at 10:55, Eli Zaretskii e...@gnu.org wrote: Does anyone know why does the UCD define compatibility decompositions for Arabic initial, medial, and final forms, but doesn't do the same for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM? Or for that matter, for U+03C2

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Eli Zaretskii
From: Michael Everson ever...@evertype.com Date: Thu, 19 Feb 2015 11:21:19 + On 19 Feb 2015, at 10:55, Eli Zaretskii e...@gnu.org wrote: Does anyone know why does the UCD define compatibility decompositions for Arabic initial, medial, and final forms, but doesn't do the same for

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Eli Zaretskii
Date: Thu, 19 Feb 2015 11:47:24 GMT From: Julian Bradfield jcb+unic...@inf.ed.ac.uk In Arabic, the variant of a letter is determined entirely by its position, so there is no compelling need to represent the forms separately (as characters rather than glyphs) save for the existence of legacy

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Richard Wordingham
On Thu, 19 Feb 2015 22:17:30 +0200 Eli Zaretskii e...@gnu.org wrote: First, collation data is overkill for search, since the order information is not required, so the weights are simply wasting storage. The big waste is not in text-dependent storage, but in the processing for search orders

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Philippe Verdy
2015-02-19 21:17 GMT+01:00 Eli Zaretskii e...@gnu.org: From: Philippe Verdy verd...@wanadoo.fr Date: Thu, 19 Feb 2015 20:31:07 +0100 Cc: Julian Bradfield jcb+unic...@inf.ed.ac.uk, unicode Unicode Discussion unicode@unicode.org The decompositions are not needed for plain text

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Richard Wordingham
On Fri, 20 Feb 2015 11:50:17 +0900 Martin J. Dürst due...@it.aoyama.ac.jp wrote: If the question isn't Why are there equivalences useful for search that are not covered by compatibility decompositions?, but Why doesn't Unicode provide some data for final/non-final Hebrew letter

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Eli Zaretskii
Date: Thu, 19 Feb 2015 13:08:57 -0800 From: Markus Scherer markus@gmail.com Cc: Philippe Verdy verd...@wanadoo.fr, Julian Bradfield jcb+unic...@inf.ed.ac.uk, Unicode Mailing List unicode@unicode.org Sorry, I disagree. First, collation data is overkill for search, since

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Martin J. Dürst
On 2015/02/20 05:17, Eli Zaretskii wrote: From: Philippe Verdy verd...@wanadoo.fr Date: Thu, 19 Feb 2015 20:31:07 +0100 Cc: Julian Bradfield jcb+unic...@inf.ed.ac.uk, unicode Unicode Discussion unicode@unicode.org The decompositions are not needed for plain text searches, that can use

Re: Compatibility decomposition for Hebrew and Greek final letters

2015-02-19 Thread Martin J. Dürst
On 2015/02/19 20:47, Julian Bradfield wrote: On 2015-02-19, Eli Zaretskii e...@gnu.org wrote: Does anyone know why does the UCD define compatibility decompositions for Arabic initial, medial, and final forms, but doesn't do the same for Hebrew final letters, like U+05DD HEBREW LETTER FINAL MEM?