On Fri, 28 Nov 2008 00:20:38 +0100 Pander <[EMAIL PROTECTED]> babbled:
> Is it possible to put comments in the .dic file? If so, in what format? > E.g. only the first couple of lines which start with a #. no. it doesnt support comments. > Carsten Haitzler (The Rasterman) wrote: > > On Thu, 20 Nov 2008 10:55:02 +0100 (CET) "Pander" > > <[EMAIL PROTECTED]> babbled: > > > > any dictionary should not care about gsm encodings. it should be just a utf8 > > dictionary file. it is the job of the sms app to convert normal utf8 > > unicode to whatever encoding used by the network, and back. :) > > > >> Small correction to my text: > >> > >> "Note that more characters" must be "Note that certain special characters > >> are in GSM 03.38 which are not in extended ASCII" > >> > >> > >> Nevertheless, one complete utf-8 dictionary could be used by most > >> applications, also SMS. The conversion I do for GSM 03.38 could also be > >> done later just before sending the SMS. > >> > >> On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: > >>> I have no idea... I might only make a new version with utf-8 encoded > >>> characters. :) > >>> > >>> > >>> On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: > >>>> Hi all, > >>>> > >>>> I intent to generate the following: > >>>> - a full list utf-8 (for 8 bit SMS and regular use, default) > >>>> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) > >>>> - c truncated list utf-8 (for 8 bit SMS and regular use) > >>>> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) > >>>> > >>>> [1] These utf-8 characters in this list are within the 7-bit range of > >>>> GSM > >>>> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note > >>>> that more characters > >>>> > >>>> a and b will both have 250,000 words > >>>> b will be conversion, remapping and normalisation of a > >>>> c and d are truncations and normalisation of respectively a and b > >>>> > >>>> For utf-16, a simple conversion of the utf-8 files can be used, but I'll > >>>> leave this for now. This could result in two extra files. > >>>> > >>>> Note that nor extended nor non-extended ASCII is available. Is this > >>>> desirable? This can result in four extra files. > >>>> > >>>> So, I can come up with 10 different files. Which are according to you > >>>> the > >>>> most useful? > >>>> > >>>> Regards, > >>>> > >>>> Pander > >>>> > >>>> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: > >>>>> On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan > >>>> (Treviño)" > >>>>> wrote: > >>>>>> Pander wrote: > >>>>>>> Of course this particular word list is very long and contains about > >>>>>>> 250,000 words and has a typical loooong tail. Many words or > >>>>>> compositions > >>>>>>> or occur seldom in average day use. > >>>>>>> > >>>>>>> What would be a good cut off point in number of words, also in > >>>> terms > >>>>>> of > >>>>>>> performance? > >>>>>>> > >>>>>>> The Portuguese list contains 56,609 words. Is this workable? How > >>>> many > >>>>>>> does the English contain? > >>>>>> The Italian one can count also 500'000 words (to be short), but I can > >>>>>> get a well working dictionary only using a smaller one (with about > >>>>>> 150'000 words that I've taken counting its google popularity). > >>>>>> > >>>>>> Btw I've written more complete posts about this on the list... > >>>>> Well, since my basis was based on a million words taken from the most > >>>>> printed daily newspaper in Portugal (I didn't count but still I > >>>> removed > >>>>> a lot of non words like numbers, etc...) already with frequency data, > >>>> my > >>>>> job was so much easier... :) > >>>>> > >>>>> As for writing SMS/text messages... I haven't found yet a word that > >>>>> wasn't there (in fact my problem is that it so often is the first of > >>>>> several matches so I have to use the menu on the left) but I must > >>>>> confess to not be one of those whose primary use of the phone is > >>>>> SMS/text! > >>>>> > >>>>> Rui > >>>>> > >>>>> -- > >>>>> Frink! > >>>>> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD > >>>> 3174 > >>>>> + No matter how much you do, you never do enough -- unknown > >>>>> + Whatever you do will be insignificant, > >>>>> | but it is very important that you do it -- Gandhi > >>>>> + So let's do it...? > >>>>> > >>>>> _______________________________________________ > >>>>> Openmoko community mailing list > >>>>> [email protected] > >>>>> http://lists.openmoko.org/mailman/listinfo/community > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Openmoko community mailing list > >>>> [email protected] > >>>> http://lists.openmoko.org/mailman/listinfo/community > >>> -- > >>> You are what you see. > >>> Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 > >>> + No matter how much you do, you never do enough -- unknown > >>> + Whatever you do will be insignificant, > >>> | but it is very important that you do it -- Gandhi > >>> + So let's do it...? > >>> > >>> _______________________________________________ > >>> Openmoko community mailing list > >>> [email protected] > >>> http://lists.openmoko.org/mailman/listinfo/community > >>> > >> > >> > >> _______________________________________________ > >> Openmoko community mailing list > >> [email protected] > >> http://lists.openmoko.org/mailman/listinfo/community > >> > > > > > > > _______________________________________________ > Openmoko community mailing list > [email protected] > http://lists.openmoko.org/mailman/listinfo/community -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [EMAIL PROTECTED] _______________________________________________ Openmoko community mailing list [email protected] http://lists.openmoko.org/mailman/listinfo/community

