On Wed, 4 Jun 2003 18:11:48 -0500 , "Mount, Rob (Robert F)" wrote:
> I am investigating differing behavior in various environments of the > wide-character version of the C function isAlpha with respect to > character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK. > > The UNICODE documents seem abiguous on this point: the General > Catetory is "Lm" which, although informative instead of normative, > would seem to imply that it is alphabetic; likewise > DerivedCoreProperties-4.0.0.txt indicates that it is alphabetic; but > PropList-4.0.0.txt contains two records - one indicating that it is > a diacritic, one that indicates it is an extender. U+30FC (KATAKANA-HIRAGANA PROLONGED SOUND MARK) is, I would say, identical in function to U+02D0 (MODIFIER LETTER TRIANGULAR COLON) that is used to indicate a long vowel in IPA. Both U+30FC and U+02D0 are signs that are appended to a character representing a vowel to indicate that it is a long vowel sound. Both U+30FC and U+02D0 have a General Category of "Lm" (Modifier_Letter), and in PropList.txt are included under the Extender property. However only U+30FC is also included under the Diacritic property. Likewise, U+1843 (MONGOLIAN LETTER TODO LONG VOWEL SIGN), which has a similar function to U+30FC, is classified as an Extender but not as a Diacritic. The definition of "Extender" in UCD.html is : "Characters whose principal function is to extend the value or shape of a preceding alphabetic character. Typical of these are length and iteration marks." U+30FC, U+02D0 and U+30FC are indeed all "length marks", and are rightly classified as Extenders. But why then is U+30FC alone also classified as a Diacritic (according to UCD.html "Characters that linguistically modify the meaning of another character to which they apply") ? As far as I am aware U+30FC does not "linguistically modify the meaning of another character" other than lengthen a preceding vowel. Andrew

