Attached is an old email that represents the most authoritative
information that I have on the diacritic characters used in dictionaries
of the Welsh language. Hope this helped ...

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

------- Forwarded Message

Date: Tue, 18 Aug 1998 17:10:15 +0100
To: [EMAIL PROTECTED]
From: Andrew Hawke <[EMAIL PROTECTED]>
Subject: Welsh character sets (LONG MESSAGE)

Markus, 
       you e-mailed [EMAIL PROTECTED] regarding the frequency
of certain Welsh letter+accent combinations. He submitted your query
to the WELSH-L discussion list. I have replied to the list, but I also
felt that I should take the liberty of contacting you directly, as this
is something I have strong views on.

Some background:
I am Assistant Editor and Systems Manager for the University of Wales 
Dictionary of Welsh, the standard scholarly dictionary of the language. 
I also chair the Celtic Texts Specialist Group of the International
Association of Literary and Linguistic Computing. The University of
Wales has an orthography committee which publishes guidelines for
Welsh spelling which are accepted by all Welsh writers and
publishers. These notes are based on those guidelines. Welsh is now
legally one of the two official languages of Wales, on an equal
legal footing with English. The government has established a body
called the Welsh Language Board to promote the use of Welsh. The
language is now taught in every school in Wales (and is the main
language of instruction in many of them). Some 600 books and
many magazines and newspapers are published annually. The use of the
language in all spheres, and increasingly in business, public life,
the administration of justice, education, government and the media 
(there is a Welsh-language TV channel) is growing rapidly. Welsh is
spoken by approximately 500,000 people in Wales, and by several hundred
thousand outside Wales. The number of speakers showed a slight increase
at the last census, after nearly a century of continuous decline.

The availabilty of character sets to represent the language is
absolutely essential, and such character sets should be as complete
as possible. In the past, the lack of appropriate character sets
has been a considerable deterrant to using the language in print and
electronically. I would urge you to bear this in mind when considering
the following.

Johann van Wingen (of the Netherlands WG on ISO 10,460) pushed hard for
the inclusion of all the possible Welsh letter/accent combinations,
which was eventually accepted by the ISO and subsequently Unicode.

Microsoft has also committed to including the 13 additional characters
in its OpenType fonts. I have communicated extensively on this point
with John Hudson of Tiro Typeworks in Vancouver (www.tiro.com) who has
been working on OpenType fonts for Microsoft and for academic purposes.
I reproduce below my main comments to him which may be of assistance
to you.

===================== COPIED MATERIAL FOLLOWS ================

Modern usage of the diacritics in Welsh is as follows:

(All diacritics are shown following the vowel which is accented, e.g.
a^ represents a lower-case a with a circumflex accent.)

Welsh requires the circumflex (^), acute ('), grave (`), and diaeresis (")
on all vowels, i.e. a, e, i, o, u w, y (w being used in Welsh both as a
vowel and a semi-vowel). The incidence of these combinations varies very
widely.

All diacritics (accents) in Modern standard Welsh are compulsory and are
used to differentiate between different pronunciations of otherwise
similar- or identical-looking words, either in terms of length (long vs.
short) or stress. The stress accent in Welsh always falls on the penultimate
syllable, unless an accent (or a hyphen or an inserted h) indicates otherwise.

BECAUSE OF THIS, ALL THE ACCENTED WELSH CHARACTERS ARE REQUIRED, IN BOTH
UPPER- AND LOWER-CASE FORMS.

The circumflex is used solely to indicate that a vowel is long in a context
in which it would normally be expected to be short, e.g.:

        gwa^n `he pierces'      vs.     gwan `weak'
        gwe^n `a smile'         vs.     gwen `white (fem.)'
        pi^n `pine (wood, tree)' vs.    pi`n `a pin'     
        co^r `a choir'          vs.     cor `a dwarf'
        bu^m `I was (perfect)'  vs.     bum `five (mutated)'
        tw^r `a tower'          vs.     twr `a group'
        y^m `we are'            vs.     ym `in (before m)'

The diaeresis is used to separate vowels, as in English:

        prosa"ig `prosaic', cre"wr `creator', copi"o `to copy',
        tro"edigaeth `conversion', du"wch `blackness', Rebacay"ddiaeth
        `Rebaccaism', cyw"res `concubine'

The acute accent is used to indicate unexpected stress (i.e. not on the 
penultimate):

        casa'u `to hate', case't `cassette', ricri'wt `a recruit'
        paraso'l `a parasol', rebu'wc `a rebuke', 
        caridy'ms `riff-raff', gw'raidd `manly' (this last is on the
        penult, but is to distinguish it from the word gwraidd `root',
        which is monosyllabic)

The grave accent is used to indicate that a vowel is short in a context
in which it would normally be expected to be long:

        pa`s `a pass, permit'   vs.     pas `a cough'
        sie`d `a shed'          vs.     sie^d/sied `escheat'
        sgi`l `a skill'         vs.     sgi^l/sgil `following'
        no`d `a nod'            vs.     nod `a target, an aim'
        cu`l `a hut'            vs.     cul `narrow'
        mw`g `a mug'            vs.     mwg `smoke (n.)'
        py`g `dirty'            vs.     pyg `pitch, tar'

Generally speaking, diacritics in Welsh cannot reasonably be omitted as they
are used either to show unusual stress, or to differentiate between pairs of 
otherwise identical words with different pronunciations. As such they are
equally necessary in upper- and lower-case forms.

The commonest diacritic is the circumflex, followed by the acute and diaeresis
probably about equally. The grave is rare, but as more and more words are
borrowed from English, and new compounds coined for technical terms, their
use will undoubtedly increase.

To give a very rough indication, according to the headwords in our
(unfinished)
dictionary (which we estimate will contain about about 84,500 entries), the
number of accented keywords (extrapolated to the expected finished size of the
dictionary) will be roughly: 

        circumflex: 2,000; diaeresis: 880; acute: 500; grave: 160

All the above remarks refer to Modern Welsh orthography.

=========================== COPIED MATERIAL ENDS ==========================

>From the background information you supplied, I certainly feel that the
character set for publishing, etc., MUST include all the possible
combinations. The low-end set should ideally be as complete as possible,
but I do appreciate that this may cause problems. If a compromise HAS
to be made, w+" could be dispensed with first, followed by w+' and y+`.
The upper-case versions are less essential than the lower-case ones,
but W+^ and Y+^ MUST be retained, even at the expense of the more
dispensable lower-case combinations just mentioned (w+", w+`, y+`).
E-mail and Web applications are becoming increasingly important in
Wales, and the ability to write correctly is of great benefit.

Thank you for your interest in the language. Please do not hesitate
to contact me if you wish to discuss the matter further. If you feel
there is someone else to whom I should make representations (such
as a UK representative, perhaps), please send me contact details.

With best wishes

Andrew Hawke


------- End of Forwarded Message

_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Reply via email to