Like other things posted here without notices to the contrary, this code is in the public domain.
I recently posted a prototype pie-menu-based input method that included the cryptic message: # some research from mine in June, redone with a better wordlist # in October, suggested 'erlu kxti fgpy ascb vqohm jwznd' # but also found that simple abcd efgh ijlkm nopq rstu vwxyz was # very nearly as good. Both had less than 1.02 strokes per character. # however, some more research in October found that # 'ht ciksz dgn ervx fjmopqu ablwy' was marginally better still, # and also groups together letters that occur together. This refers to the decision of which letters to group together on which pie-menu items. (This will be incomprehensible if you don't understand how the input method works.) Since each menu selection is ambiguous, it's desirable to group together letters that tend not to occur in the same places, in order to minimize the list of candidate words when you're done selecting letters. To shorten the time to enter a word, it's also desirable that subsequent letters be on the same menu item. So, in October, I wrote a program that does hill-climbing keyboard design to maximize the input bit rate, assuming that selecting a "key" (menu item) that's different from the key you selected before takes 1 time unit, while selecting the same key again takes 1/2 time unit. It generally gave the same results, except when I shrank the initial population too small, and it generally took only a few minutes. It thinks its preferred keyboard allows 2.988 bits of input per time unit, which sounds pretty good for a six-key keyboard, but it usually got to 2.7 or better within the first generation. #!/usr/bin/python import random, sys, keybits digraphs = {} for line in file('digraphs.out'): count, digraph = line.split() digraphs[digraph] = int(count) nkeys = 6 def sum(seq): rv = 0 for item in seq: rv += item return rv # time(keys): # time to enter the sample text on the keys # exclusive of disambiguation time. # hitting a key takes 1 time unit # unless it's the same as the previous key # in which case it's 1/2 unit. # Some empirical testing (before distorting the menu in different directions, # with four-letter words) showed that 'bass' (all letters on one key) took # about 2/3 of the time of 'onto', doing each one ten times. # So the time for a particular letter # is the number of occurrences of that letter # minus two thirds of the number of digraphs ending in that letter # and beginning in another letter on that key def timeletter(let, key): basescore = keybits.freq[let] digraphscore = sum([digraphs.get(prevlet + let, 0) for prevlet in key]) return basescore - digraphscore / 2.0 def timekey(key): return sum([timeletter(let, key) for let in key]) def time(keys): return sum([timekey(key) for key in keys]) # print time(['abcdefghijklmnopqrstuvwxyz']) # print time(['abcdefghijklm', 'nopqrstuvwxyz']) # print time('abcdefghijklmnopqrstuvwxyz') def try_some_stuff(maxlosers): bestscore = 0 losers = 0 while 1: kb = genkb() score = scorekb(kb) if score > bestscore: print score, kb sys.stdout.flush() bestscore = score losers = 0 else: losers += 1 if losers > maxlosers: return def scorekb(kb): return keybits.kbbits(kb)/time(kb) def randvec(): return [random.randrange(nkeys) for ii in range(26)] def genkb(): return mkkb(randvec()) def mkkb(vec): rv = ['' for ii in range(nkeys)] for letter, ii in zip('abcdefghijklmnopqrstuvwxyz', vec): rv[ii] += letter return rv class Breeder: def __init__(self, n): self.n = n self.population = [randvec() for ii in range(self.n)] def cull(self): kbs = [(mkkb(vec), vec) for vec in self.population] scores = [(scorekb(kb), kb, vec) for kb, vec in kbs] scores.sort() scores.reverse() print scores[0] self.population = [vec for score, kb, vec in scores[:self.n/2]] def breed(self): while len(self.population) < self.n: new = random.choice(self.population)[:] new[random.randrange(len(new))] = random.randrange(nkeys) self.population.append(new) if __name__ == '__main__': x = Breeder(300) while 1: x.breed() x.cull() This program depends on the 'keybits' module, which follows: #!/usr/bin/python import math totlets = 0.0 freq = {} for line in file('letfreq.out'): count, letter = line.split() freq[letter] = int(count) totlets += freq[letter] def totfreq(akey): thiskey = 0 for letter in akey: thiskey += freq[letter] return thiskey def bits(akey): return math.log(totlets/totfreq(akey)) / math.log(2) def weightedbits(akey): if totfreq(akey) == 0: return 0 return totfreq(akey) * bits(akey) def kbbits(kb): total = 0 for key in kb: total += weightedbits(key) return total The file 'letfreq.out' just says: 51256110 e 38440017 t 33184490 a 31852968 o 30294727 i 28596682 n 26258206 s 25032539 r 22561628 h 16778132 l 15511322 d 12620467 c 11535260 u 10302917 m 9168289 f 8198387 p 8167927 w 8109688 y 7817011 g 6612137 b 4207506 v 2908330 k 803302 x 678152 j 437522 q 237994 z The file 'digraphs.out' is much larger, but it begins: 12106855 th 10791177 he 7585883 in 6643408 er 6415329 an 5789288 re 5154157 on 4428445 en 4426168 nd 4390391 at It was produced from my wordlist (http://pobox.com/~kragen/sw/wordlist) with this program: #!/usr/bin/python # given a file like this: # 6187927 the # 2941790 of # 2682878 and # 2544858 to # 2150885 a # 1849882 in # 1089559 it # 998867 is # 923975 was # 920582 that # with word frequencies, report on the frequency of each pair of letters. # In the above sequence, 'th' appears twice, while a number of other # pairs of letters occur only once. So 'th' gets a frequency of # 6187927 + 920582, while 'he' gets a frequency only of 6187927, 'of' # gets a frequency only of 2941790, 'an' and 'nd' get frequencies only # of 2682878, etc. # This information was once useful for cryptanalysis, but is now # useful to me for keyboard design. # This program runs in 14 seconds on a 109 000 word list computed from # the British National Corpus, part of which is shown above. from __future__ import generators def pairs(astring): ii = iter(astring) first = ii.next() for item in ii: yield first + item first = item digraphs = {} for line in file('wordlist'): count, word = line.split() for digraph in pairs(word): try: digraphs[digraph] += int(count) except KeyError: digraphs[digraph] = int(count) counts = [(count, digraph) for digraph, count in digraphs.iteritems()] counts.sort() counts.reverse() for item in counts: print "%8d %s" % item