Sofia a radix tree builds a tree of all the prefixes common between words and then attaches a leaf node with the new suffix when the word doesn't fit in the existing structure. The upshot of this is as your word count increases the storage cost per word decreases. As for actual performance numbers data is scarce but this Stack Overflow post (http://goo.gl/TpPAK) mentions getting a trie with 1.1 million words down to 139M (implemented in C#).
My understanding is we have around a 280M limit on App Engine instances and I don't know about Portuguese but according to Wikipedia theres only around 600,000 words in the English language, besides you could always just use it to cache however many of the most popular words 60M or something would get you then fall back to the datastore to check on less common/unprocessed words (if you did this then you could fit even more words into a read only dictionary using a DAWG). Check out the Wikipedia articles on Radix Tree, Trie, Patricia Trie - there all variations on the theme. Unfortunately you will probably have to roll your own implementation since all the python libraries I have seen use compiled code for performance, but there far from the most scary thing to implement and if you have access to 'The Art of Programming' theres a chapter on them in there. -regards, Shane On Feb 4, 1:06 am, sofia <[email protected]> wrote: > @Nick I'm in Portugal so for now I don't have access to the prediction api. > Maybe when there's worldwide access i'll switch but for now it's a no go - > any estimates on when it will be opened to the rest of the world? :) > > @Shane not sure how this could be implemented.. there could be > thousands/millions of words classified which have to be checked against the > new ones, so how would i load them all into memory in appengine without > hitting cpu/memory limits? I'm not familiar with radix trees so i could be > missing something. > > @Albert good to know! -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
