Attached is an old email that represents the most authoritative information that I have on the diacritic characters used in dictionaries of the Welsh language. Hope this helped ...
Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> ------- Forwarded Message Date: Tue, 18 Aug 1998 17:10:15 +0100 To: [EMAIL PROTECTED] From: Andrew Hawke <[EMAIL PROTECTED]> Subject: Welsh character sets (LONG MESSAGE) Markus, you e-mailed [EMAIL PROTECTED] regarding the frequency of certain Welsh letter+accent combinations. He submitted your query to the WELSH-L discussion list. I have replied to the list, but I also felt that I should take the liberty of contacting you directly, as this is something I have strong views on. Some background: I am Assistant Editor and Systems Manager for the University of Wales Dictionary of Welsh, the standard scholarly dictionary of the language. I also chair the Celtic Texts Specialist Group of the International Association of Literary and Linguistic Computing. The University of Wales has an orthography committee which publishes guidelines for Welsh spelling which are accepted by all Welsh writers and publishers. These notes are based on those guidelines. Welsh is now legally one of the two official languages of Wales, on an equal legal footing with English. The government has established a body called the Welsh Language Board to promote the use of Welsh. The language is now taught in every school in Wales (and is the main language of instruction in many of them). Some 600 books and many magazines and newspapers are published annually. The use of the language in all spheres, and increasingly in business, public life, the administration of justice, education, government and the media (there is a Welsh-language TV channel) is growing rapidly. Welsh is spoken by approximately 500,000 people in Wales, and by several hundred thousand outside Wales. The number of speakers showed a slight increase at the last census, after nearly a century of continuous decline. The availabilty of character sets to represent the language is absolutely essential, and such character sets should be as complete as possible. In the past, the lack of appropriate character sets has been a considerable deterrant to using the language in print and electronically. I would urge you to bear this in mind when considering the following. Johann van Wingen (of the Netherlands WG on ISO 10,460) pushed hard for the inclusion of all the possible Welsh letter/accent combinations, which was eventually accepted by the ISO and subsequently Unicode. Microsoft has also committed to including the 13 additional characters in its OpenType fonts. I have communicated extensively on this point with John Hudson of Tiro Typeworks in Vancouver (www.tiro.com) who has been working on OpenType fonts for Microsoft and for academic purposes. I reproduce below my main comments to him which may be of assistance to you. ===================== COPIED MATERIAL FOLLOWS ================ Modern usage of the diacritics in Welsh is as follows: (All diacritics are shown following the vowel which is accented, e.g. a^ represents a lower-case a with a circumflex accent.) Welsh requires the circumflex (^), acute ('), grave (`), and diaeresis (") on all vowels, i.e. a, e, i, o, u w, y (w being used in Welsh both as a vowel and a semi-vowel). The incidence of these combinations varies very widely. All diacritics (accents) in Modern standard Welsh are compulsory and are used to differentiate between different pronunciations of otherwise similar- or identical-looking words, either in terms of length (long vs. short) or stress. The stress accent in Welsh always falls on the penultimate syllable, unless an accent (or a hyphen or an inserted h) indicates otherwise. BECAUSE OF THIS, ALL THE ACCENTED WELSH CHARACTERS ARE REQUIRED, IN BOTH UPPER- AND LOWER-CASE FORMS. The circumflex is used solely to indicate that a vowel is long in a context in which it would normally be expected to be short, e.g.: gwa^n `he pierces' vs. gwan `weak' gwe^n `a smile' vs. gwen `white (fem.)' pi^n `pine (wood, tree)' vs. pi`n `a pin' co^r `a choir' vs. cor `a dwarf' bu^m `I was (perfect)' vs. bum `five (mutated)' tw^r `a tower' vs. twr `a group' y^m `we are' vs. ym `in (before m)' The diaeresis is used to separate vowels, as in English: prosa"ig `prosaic', cre"wr `creator', copi"o `to copy', tro"edigaeth `conversion', du"wch `blackness', Rebacay"ddiaeth `Rebaccaism', cyw"res `concubine' The acute accent is used to indicate unexpected stress (i.e. not on the penultimate): casa'u `to hate', case't `cassette', ricri'wt `a recruit' paraso'l `a parasol', rebu'wc `a rebuke', caridy'ms `riff-raff', gw'raidd `manly' (this last is on the penult, but is to distinguish it from the word gwraidd `root', which is monosyllabic) The grave accent is used to indicate that a vowel is short in a context in which it would normally be expected to be long: pa`s `a pass, permit' vs. pas `a cough' sie`d `a shed' vs. sie^d/sied `escheat' sgi`l `a skill' vs. sgi^l/sgil `following' no`d `a nod' vs. nod `a target, an aim' cu`l `a hut' vs. cul `narrow' mw`g `a mug' vs. mwg `smoke (n.)' py`g `dirty' vs. pyg `pitch, tar' Generally speaking, diacritics in Welsh cannot reasonably be omitted as they are used either to show unusual stress, or to differentiate between pairs of otherwise identical words with different pronunciations. As such they are equally necessary in upper- and lower-case forms. The commonest diacritic is the circumflex, followed by the acute and diaeresis probably about equally. The grave is rare, but as more and more words are borrowed from English, and new compounds coined for technical terms, their use will undoubtedly increase. To give a very rough indication, according to the headwords in our (unfinished) dictionary (which we estimate will contain about about 84,500 entries), the number of accented keywords (extrapolated to the expected finished size of the dictionary) will be roughly: circumflex: 2,000; diaeresis: 880; acute: 500; grave: 160 All the above remarks refer to Modern Welsh orthography. =========================== COPIED MATERIAL ENDS ========================== >From the background information you supplied, I certainly feel that the character set for publishing, etc., MUST include all the possible combinations. The low-end set should ideally be as complete as possible, but I do appreciate that this may cause problems. If a compromise HAS to be made, w+" could be dispensed with first, followed by w+' and y+`. The upper-case versions are less essential than the lower-case ones, but W+^ and Y+^ MUST be retained, even at the expense of the more dispensable lower-case combinations just mentioned (w+", w+`, y+`). E-mail and Web applications are becoming increasingly important in Wales, and the ability to write correctly is of great benefit. Thank you for your interest in the language. Please do not hesitate to contact me if you wish to discuss the matter further. If you feel there is someone else to whom I should make representations (such as a UK representative, perhaps), please send me contact details. With best wishes Andrew Hawke ------- End of Forwarded Message _______________________________________________ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
