On Thu, 20 Nov 2008 10:55:02 +0100 (CET) "Pander" <[EMAIL PROTECTED]> babbled:
any dictionary should not care about gsm encodings. it should be just a utf8 dictionary file. it is the job of the sms app to convert normal utf8 unicode to whatever encoding used by the network, and back. :) > Small correction to my text: > > "Note that more characters" must be "Note that certain special characters > are in GSM 03.38 which are not in extended ASCII" > > > Nevertheless, one complete utf-8 dictionary could be used by most > applications, also SMS. The conversion I do for GSM 03.38 could also be > done later just before sending the SMS. > > On Thu, November 20, 2008 10:44, Rui Miguel Silva Seabra wrote: > > I have no idea... I might only make a new version with utf-8 encoded > > characters. :) > > > > > > On Thu, Nov 20, 2008 at 10:40:46AM +0100, Pander wrote: > >> Hi all, > >> > >> I intent to generate the following: > >> - a full list utf-8 (for 8 bit SMS and regular use, default) > >> - b full list utf-8 GSM 03.38[1] (for 7 bit SMS) > >> - c truncated list utf-8 (for 8 bit SMS and regular use) > >> - d truncated list utf-8 GSM 03.38[1] (for 7 bit SMS, default) > >> > >> [1] These utf-8 characters in this list are within the 7-bit range of > >> GSM > >> 03.38, see http://en.wikipedia.org/wiki/Short_message_service#GSM Note > >> that more characters > >> > >> a and b will both have 250,000 words > >> b will be conversion, remapping and normalisation of a > >> c and d are truncations and normalisation of respectively a and b > >> > >> For utf-16, a simple conversion of the utf-8 files can be used, but I'll > >> leave this for now. This could result in two extra files. > >> > >> Note that nor extended nor non-extended ASCII is available. Is this > >> desirable? This can result in four extra files. > >> > >> So, I can come up with 10 different files. Which are according to you > >> the > >> most useful? > >> > >> Regards, > >> > >> Pander > >> > >> On Thu, November 20, 2008 08:58, Rui Miguel Silva Seabra wrote: > >> > On Thu, Nov 20, 2008 at 03:02:41AM +0100, "Marco Trevisan > >> (Treviño)" > >> > wrote: > >> >> Pander wrote: > >> >> > Of course this particular word list is very long and contains about > >> >> > 250,000 words and has a typical loooong tail. Many words or > >> >> compositions > >> >> > or occur seldom in average day use. > >> >> > > >> >> > What would be a good cut off point in number of words, also in > >> terms > >> >> of > >> >> > performance? > >> >> > > >> >> > The Portuguese list contains 56,609 words. Is this workable? How > >> many > >> >> > does the English contain? > >> >> > >> >> The Italian one can count also 500'000 words (to be short), but I can > >> >> get a well working dictionary only using a smaller one (with about > >> >> 150'000 words that I've taken counting its google popularity). > >> >> > >> >> Btw I've written more complete posts about this on the list... > >> > > >> > Well, since my basis was based on a million words taken from the most > >> > printed daily newspaper in Portugal (I didn't count but still I > >> removed > >> > a lot of non words like numbers, etc...) already with frequency data, > >> my > >> > job was so much easier... :) > >> > > >> > As for writing SMS/text messages... I haven't found yet a word that > >> > wasn't there (in fact my problem is that it so often is the first of > >> > several matches so I have to use the menu on the left) but I must > >> > confess to not be one of those whose primary use of the phone is > >> > SMS/text! > >> > > >> > Rui > >> > > >> > -- > >> > Frink! > >> > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD > >> 3174 > >> > + No matter how much you do, you never do enough -- unknown > >> > + Whatever you do will be insignificant, > >> > | but it is very important that you do it -- Gandhi > >> > + So let's do it...? > >> > > >> > _______________________________________________ > >> > Openmoko community mailing list > >> > community@lists.openmoko.org > >> > http://lists.openmoko.org/mailman/listinfo/community > >> > > >> > >> > >> > >> _______________________________________________ > >> Openmoko community mailing list > >> community@lists.openmoko.org > >> http://lists.openmoko.org/mailman/listinfo/community > > > > -- > > You are what you see. > > Today is Prickle-Prickle, the 32nd day of The Aftermath in the YOLD 3174 > > + No matter how much you do, you never do enough -- unknown > > + Whatever you do will be insignificant, > > | but it is very important that you do it -- Gandhi > > + So let's do it...? > > > > _______________________________________________ > > Openmoko community mailing list > > community@lists.openmoko.org > > http://lists.openmoko.org/mailman/listinfo/community > > > > > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- The Rasterman (Carsten Haitzler) [EMAIL PROTECTED] _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community