Is it possible to put comments in the .dic file? If so, in what format? E.g. only the first couple of lines which start with a #.
Carsten Haitzler (The Rasterman) wrote: > On Thu, 20 Nov 2008 10:55:02 +0100 (CET) "Pander" > <[EMAIL PROTECTED]> babbled: > > any dictionary should not care about gsm encodings. it should be just a utf8 > dictionary file. it is the job of the sms app to convert normal utf8 unicode > to > whatever encoding used by the network, and back. :) > >> Small correction to my text: >> >> "Note that more characters" must be "Note that certain special characters >> are in GSM 03.38 which are not in extended ASCII" >> >> >> Nevertheless, one complete utf-8 dictionary could be used by most >> applications, also SMS. The conversion I do for GSM 03.38 could also be >> done later just before sending the SMS. >> >> On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: >>> I have no idea... I might only make a new version with utf-8 encoded >>> characters. :) >>> >>> >>> On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: >>>> Hi all, >>>> >>>> I intent to generate the following: >>>> - a full list utf-8 (for 8 bit SMS and regular use, default) >>>> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) >>>> - c truncated list utf-8 (for 8 bit SMS and regular use) >>>> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) >>>> >>>> [1] These utf-8 characters in this list are within the 7-bit range of >>>> GSM >>>> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note >>>> that more characters >>>> >>>> a and b will both have 250,000 words >>>> b will be conversion, remapping and normalisation of a >>>> c and d are truncations and normalisation of respectively a and b >>>> >>>> For utf-16, a simple conversion of the utf-8 files can be used, but I'll >>>> leave this for now. This could result in two extra files. >>>> >>>> Note that nor extended nor non-extended ASCII is available. Is this >>>> desirable? This can result in four extra files. >>>> >>>> So, I can come up with 10 different files. Which are according to you >>>> the >>>> most useful? >>>> >>>> Regards, >>>> >>>> Pander >>>> >>>> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: >>>>> On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan >>>> (Treviño)" >>>>> wrote: >>>>>> Pander wrote: >>>>>>> Of course this particular word list is very long and contains about >>>>>>> 250,000 words and has a typical loooong tail. Many words or >>>>>> compositions >>>>>>> or occur seldom in average day use. >>>>>>> >>>>>>> What would be a good cut off point in number of words, also in >>>> terms >>>>>> of >>>>>>> performance? >>>>>>> >>>>>>> The Portuguese list contains 56,609 words. Is this workable? How >>>> many >>>>>>> does the English contain? >>>>>> The Italian one can count also 500'000 words (to be short), but I can >>>>>> get a well working dictionary only using a smaller one (with about >>>>>> 150'000 words that I've taken counting its google popularity). >>>>>> >>>>>> Btw I've written more complete posts about this on the list... >>>>> Well, since my basis was based on a million words taken from the most >>>>> printed daily newspaper in Portugal (I didn't count but still I >>>> removed >>>>> a lot of non words like numbers, etc...) already with frequency data, >>>> my >>>>> job was so much easier... :) >>>>> >>>>> As for writing SMS/text messages... I haven't found yet a word that >>>>> wasn't there (in fact my problem is that it so often is the first of >>>>> several matches so I have to use the menu on the left) but I must >>>>> confess to not be one of those whose primary use of the phone is >>>>> SMS/text! >>>>> >>>>> Rui >>>>> >>>>> -- >>>>> Frink! >>>>> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD >>>> 3174 >>>>> + No matter how much you do, you never do enough -- unknown >>>>> + Whatever you do will be insignificant, >>>>> | but it is very important that you do it -- Gandhi >>>>> + So let's do it...? >>>>> >>>>> _______________________________________________ >>>>> Openmoko community mailing list >>>>> [email protected] >>>>> http://lists.openmoko.org/mailman/listinfo/community >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Openmoko community mailing list >>>> [email protected] >>>> http://lists.openmoko.org/mailman/listinfo/community >>> -- >>> You are what you see. >>> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 >>> + No matter how much you do, you never do enough -- unknown >>> + Whatever you do will be insignificant, >>> | but it is very important that you do it -- Gandhi >>> + So let's do it...? >>> >>> _______________________________________________ >>> Openmoko community mailing list >>> [email protected] >>> http://lists.openmoko.org/mailman/listinfo/community >>> >> >> >> _______________________________________________ >> Openmoko community mailing list >> [email protected] >> http://lists.openmoko.org/mailman/listinfo/community >> > > _______________________________________________ Openmoko community mailing list [email protected] http://lists.openmoko.org/mailman/listinfo/community

