I've no real idea, but I love a challenge...

It's obviously a code of some sort, but I don't think it's quite as you suspect:

...

abandon/DGS
abandonment
abase/DGS
...
abbey/MS
abbot/MS
Abbott
abbreviate/DGNSX

...

aberrate/NX
...

ability/MS
abject/PY
abjection/S
abjure/DGS
ablate/DGNSV
ablaze
able/RT
ablute/N
...

It's the 'R', 'M', 'X' etc that lead me to suspect that. I would suspect that each letter means something, just not a direct 'S' pluralises with 'S'.

Some of the listed words don't exist in my dictionaries (eg aberrate, ablute; though aberration and ablution both exist) - Concise Oxford Dictionary, and my American MW (Marion-Webster?) Dictionary.

From where did you get this list, and what's the file called? And do you have any idea of the original purpose of the list - ie what sort of app might have processed it?

They may give us something more to search if there is nobody who knows (which I suspect).
This particular one came from a CD-ROM called Bibliotech, a PD collection of all sorts of texts from classic books to those uniquely internet-style UFO conpsiracies texts and all sorets of text files I haven't even gone near! The other, more useful, lists came from some internet sites I discovered via a simple Google search for dictionaries and wordlists.

This file was described as 'ASCII word list' and that's all the info I had!

It might be more orientated towards human users, since the letters don't really imply a FIXED rule. It's also possible that combinations of letters might imply a different way of handling a word (exceptions to a rule).

aberrate is probably an example of how these sorts of word lists work - since there is a noun aberration there must be a verb to aberrate! Dictionaries don't usually list them but English varies so much worldwidew that you may find people use the verb aberrated as shorthand for suffered an aberration or whatever. Not strictly correct or recognised officially but nonetheless used in some places. Languages evolve to survive...I sometimes struggle to understand some of the English words my son uses (took me ages to pick up 'bling' and 'ming') and a few months later I find that the dictionaries start to include them.

Looking at the dozen or so English word lists (and being careful which is American and which British!) I find most of them have words like the example you gave. It looks like about 40K to 60K words seems to be the average and preferred size for a spellchecker, but there are some more specialised word lists around. Look at Geoff Wicks's Advanced Cryptics Dictionary (well, it's sold by Geoff) with 239,000 words, or the half million word list from Rich Mellor and Paul Merdinian. Those sizes of lists are a bit silly to be used as typing checkers since they often contain names or contrived words.

QTYP has a dictionary editor so you can remove any words you object to and save the revised list. You can also merge new words into the list. As long as you have a base word list of reasonable length to work from, with a bit of time, patience and determination you can develop it.

I might give this particular list a miss unless someone fancies a challenge! It has a base count of about 25,000 entries, but obviously if you add plurals, verb tenses etc it expands and I just fancied finding out what it would expand to if I could write a little filter to do so. I may just do so with the fairly obvious notes which imply 'add s' or 'add ing' or add 'able' or whatever. With a bit of help from a grammar book I might be able to write a little filter which makes a good guess at when to add a double letter, drop one of a double letter, drop a vowel from the end before adding a plural etc etc as some of these grammar rules are straightforward. Or to someone whose first language is English anyway.

If anyone fancies a challenge, I could send them a copy of the word list to go with this little first attempt at a filter! (Filter is terms of going through data, not a QDOS filter per se).

100 CLS: CLS#0:INPUT #0,'Input file > ';ip$
110 INPUT #0,'Output file > ';op$
120 OPEN_IN #3,ip$
130 OPEN_NEW #4,op$
140 no = 0 : REMark number of words
150 CLS : CLS #0
160 REPeat loop
170   IF INKEY$ = CHR$(27) THEN EXIT loop : REMark ESC
180   IF EOF(#3):EXIT loop
190   INPUT #3,word$
200   IF word$ = '' THEN NEXT loop
210   tab = '/' INSTR word$
220   IF tab = 0 THEN
230     PRINT #4,word$ : REMark no change to word
240     no = no + 1
250   ELSE
260     REMark extract 'root' word
270     root$ = word$(1 TO tab-1)
280     PRINT #4,root$ : PRINT root$;
290     :
300 REMark get switches letters, currently only handles /DGSY, others ignored
310     sw$ = word$(tab+1 TO LEN(word$))
320     :
330     IF 'd' INSTR sw$ THEN
340 REMark add 'ed' to root, dropping final silent 'e' if present
350       IF root$(LEN(root$)) == 'e' THEN
360         PRINT #4,root$&'d' : PRINT ! root$&'d';
370       ELSE
380         PRINT #4,root$&'ed' : PRINT ! root$&'ed';
390       END IF
400       no = no + 1
410     END IF
420     :
430     IF 'g' INSTR sw$ THEN
440       REMark add 'ing', drop final silent e at end of word
450       IF root$(LEN(root$)) == 'e' THEN
460 PRINT #4,root$(1 TO LEN(root$)-1)&'ing' : PRINT ! root$(1 TO LEN(root$)-1)&'ing';
470       ELSE
480         PRINT #4,root$&'ing' : PRINT ! root$ & 'ing';
490       END IF
500       no = no + 1
510     END IF
520     :
530     IF 's' INSTR sw$ THEN
540       REMark plural or third person singular 's' or 'es'
550 IF root$(LEN(root$)) == 's' OR root$(LEN(root$)) == 'x' OR root$(LEN(root$)) == 'z' OR root$(LEN(root$)-1 TO) == 'ch' OR root$(LEN(root$)-1 TO) == 'sh' THEN
560         PRINT #4,root$&'es' : PRINT ! root$&'es';
570       ELSE
580         REMark special case is word ending with 'y'
590         REMark which depends if 'y' is preceded by vowel or not
600         IF root$(LEN(root$)) == 'y' THEN
610           IF LEN(root$) > 1 THEN
620             IF root$(LEN(root$)-1) INSTR 'aeiou' THEN
630               PRINT #4,root$&'s' : PRINT !root$&'s';
640             ELSE
650 PRINT #4,root$(1 TO LEN(root$)-1)&'ies' : PRINT ! root$(1 TO LEN(root$)-1)&'ies';
660             END IF
670           ELSE
680             PRINT #4,root$&'s' : PRINT ! root$&'s';
690           END IF
700         ELSE
710           PRINT #4,root$&'s' : PRINT ! root$&'s';
720         END IF
730       END IF
740       no = no + 1
750     END IF
760     :
770 IF 'y' INSTR sw$ THEN PRINT #4,root$&"ly" : PRINT ! root$&"ly"; : no = no + 1
780     :
790     PRINT : REMark new line after all permutations done
800     IF (no MOD 1000) = 0 THEN AT #0,0,0 : PRINT #0,no
810   END IF
820   PAUSE 5 : REMark vary speed as required for viewing
830 END REPeat loop
840 PRINT #0,no;' Words Total'
850 CLOSE #3 : CLOSE #4



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.338 / Virus Database: 267.9.7/60 - Release Date: 28/07/2005

_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm

Reply via email to