--- Jordi Mas <[EMAIL PROTECTED]> wrote: > Hello guys, Hi Jordi.
> After some discussion in the list and some IRC > talking with Dom and others, I have put together an > implementation proposal for the barbarism detection > feature. > > * What is a barbarism > > Barbarism is a problem that manly concerns to > minority languages, i.e. languages that are > competing, in the same territory, with a more > powerful one, called "rooflanguage", for example > Welsh, Catalan, Occitan, and others. > > When two languages compete in the same territory > comes up interferences, but they are not symmetric. > The roof language is weakly affected but the > minority one can be strongly affected, and can > disappear (glottophagy). One of these > interferences is barbarism. > > * How we implement it > > We have a class called Barbarism that lives in > 'src\other\spell\xp\'. We should move proofing features to a new directory which makes more sense, spelling, grammar, and barbarisms can be arranged under it in whatever system makes most sense. Should it go under XAP or AP? Spellchecking seems more cross-platform that the other parts...? > We init the class when the ispellclass is created > and when we do CheckWord and suggestWord we add also > call the Barbarism class. > > * How we store them > > The file that contains the barbarisms is an XML file > that lives in the same directory that the > dictionaries and it has the same name that the > dictionary file but with a barbarism extension. > > For example, for American will be > "american.barbarisms" This is a bad idea. Only ispell dictionaries have these names and I for one hate the ispell naming system. Aspell probably uses different names and lives in different dictionaries. Note that many distros have ispell hashes in places other than where AbiWord probably prefers them. I'm doing some work now to make AbiWord use any ispell hash files it can find. Hopefully I can even make it use a mix of ispell and aspell (and maybe even myspell) dictionaries. Making the barbarisms files depend just on the old ispell stuff is ugly - the names have to live in tables and are not easily extended by non-programmers. I'm hoping to improve this. Please give the barbarism files logical names based on language tags such as "ca.barbarisms" or "ca-ES.barbarisms". > This is an example file. The attribute "word" > contains the wrong word, the attribute suggestion > contains the right word to use > > <?xml version="1.0" encoding="utf-8"?> In XML, leaving out the encoding field causes it to default to UTF-8. > <AbiBarbarism app="AbiWord" ver="1.0" > language="ca-ES"> > > <Barbarism > word="boleto" > suggestion1="billet" > /> > > <Barbarism > word="tiro" > suggestion1="tret" > /> > > <Barbarism > word="tanteig" > suggestion1="tempteig" > /> > > > <Barbarism > word="tamany" > suggestion1="mida" > suggestion2="grand�ria" > /> > > </AbiBarbarism> That looks nice. I'm not sure if "suggestion1" and "suggestion2" is the best XML solution. Shouldn't lists best be done with actual tags? I'm not an XML expert - can an XML expert give an opinion please? > * Known problems in the design > > - We work at word level, not sentence level. We are > just hacking a spell checker I think this the correct way to do it. > - Words that can be declined have to be coded > several times (plurals, verbs declinations, etc). At > least in Catalan, this is not very common. Spelling hashes do this anyway. Agglutinative languages may be more painful but that's already the case for spelling hashes so we probably don't need to worry. If there are some Finnish or Hungarian speakers here do you have any ideas on this? > Ok, that basically it. I would love to heard your > comments to see how we can define this better that > it is right now. To me it's not so important whether it's part of spelling, part of grammar, part of style, or its own separate thing; but all these need to be move under a general proofing concept where squiggles and other common things can be grouped. Also whether it's part of spelling or part of grammar I still want to be able to enable it separately. You know, this could be a better way to solve the problems of English spelling varieties. British, Canadian, Australian, Irish, and US spellings all differ slightly and currently each one just decides whether to use the british or american hash. Nobody has so far wanted to build a special Australian or Canadian hash from scratch but it would be a lot less work to build barbarism files for them since they only need the exceptions to be listed. In this case an extra XML field "comment" or "reason" or something better could contain something useful such as "Americanism" to provide more info in the dialog. Alan, what do you think about this idea? Andrew Dunbar. > Thanks, > > -- > > Jordi Mas > http://www.softcatala.org > > > > ===== http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com
