Hunspell: http://extensions.libreoffice.org/extension-center/tira-n-teqbaylit https://addons.mozilla.org/en-us/firefox/addon/tira-n-teqbaylit/ It doesn't handle all affixes, just the core ones merged as part of the word, like verb conjugations or noun inflections. Peripheral affixes separated with "-", such as possessives, pronouns and directional particles are treated as separate words for the moment, so invalid peripheral affix clusters are not marked as incorrect. That's acceptable for a first spellchecker as these clusters are often treated more as grammar than morphology in textbooks etc.: The orthography for Moroccan Berber varieties even leaves spaces between the particles instead of using '-'. I just list each ablauted stem of a verb lemma separately, and sound or mixed plurals, though I do indicate in the config file that they belong together (the dictionary file is generated automatically from a lexical database).
One likely difference between the hunspell configuration and hfst is the default noun prefixes. For maintenance of the hunspell dictionary to be easier for non-linguists, I opted to list the lemmas in familiar citation form (e.g. a+rgaz) complete with their default free state prefixes, rather than listing a pseudo-stem (rgaz) and adding a prefix for every inflection including the default (here 'a'). So for the construct state, for plurals, and for feminine adjectives, the default prefix has to be dropped and the appropriate prefix added. I imagine I'll have to list pseudo-stems for hfst so I can chain them together with the affixes. I should be able to figure out the peripheral affixes, as they always occur in a fixed order but may be before or after a conjugated verb depending on aspect. (The affixes may appear as slight variants depending on where they appear) A-B-C-CONJUGATEDVERB or CONJUGATEDVERB-A-B-C Circumfixes might be tricky. For example, feminine nouns with t--t: tafunast (cow): citation form (with free state prefix) The prefix reduces in the construct state: tfunast Affix plural: Prefix changes (but is still reduced in the construct state) AND a suffix is added after the feminine final t is dropped Tifunasin (free state) Tfunasin (construct state) A pseudo-stem 'funas' can be made but it's artificial so I avoided it for hunspell as I mentioned. Thanks for the help! Paul ______________________________________________________________ > Od: Francis Tyers <[email protected]> > Komu: <[email protected]> > Datum: 01.01.2016 13:03 > Předmět: Re: [Apertium-stuff] Berber languages, hfst, circumfixes / prefixes > >A 2015-12-30 18:41, [email protected] escrigué: >> Hi all. In 2010 I asked about using Apertium with Berber languages. >> Since then, it has become clearer how to use hfst. Also, I've released >> a hunspell-based spellchecker for Kabyle Berber, so I'm much more >> familiar with the morphology now, more familiar with other Berber >> languages too, and have promising data sets for them. It's time to >> pick up Apertium again. >> >> I'm looking at how to define my prefixes and circumfixes with hfst. >> I'm familiar with general programming and with Berber linguistics, but >> only superficially with transducers etc. >> In the 2010 discussion, it was mentioned that "a Finnish student did >> Tamazigh[t] with Xerox tools some years ago" - does anyone have a >> reference? Is there an example (for any language) that I can look at >> regarding circumfixes? >> On the wiki I found a page "Replacement_for_flag_diacritics" with a >> Turkish example. I have a general idea of Turkish grammar but the >> intention of the example (specifically use of +/-aor) is not clear to >> me. Can anyone explain? >> Should I use flag diacritics or [] symbols for circumfixes? >> Below is a snippet from the previous discussion regarding the quirks >> of Berber languages and their likely support in hfst. > >Could you give examples of things you would like to treat and we will >try >and explain how to treat them. In terms of Apertium we would suggest not >using flag diacritics where possible, as it restricts the portability of >your automata. > >Where is the code for your hunspell-based spellchecker ? I think Tommi >might >have some conversion scripts. > >Fran > >------------------------------------------------------------------------------ >_______________________________________________ >Apertium-stuff mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ------------------------------------------------------------------------------ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
