I have full-text indexed all of O'Reilly eBooks (1,600 of them) using my own search engine. You can see how the search works if you have an Android tablet and install (free app) eCarrel (O'Reilly bookstore). To make searching manageable in the context of GAE I have originally partitioned all the books into 4 groups, each with its own index. That way searches can be performed in parallel (merging results when done), individual (per book group) indexes are smaller, and w/i group search is faster.
An index for 400 books is about 90 MB in size. To implement the search engine on GAE I would dedicate 4 applications to the task (eg. search1.appspot.com, through search4....). Each application would run exactly the same code, but would have different "application files" containing index data. (I wasn't sure if the index data should be stored in DataStore entities, Blobstore blobs, or application files; at the time the SE was first implemented it seemed that application files was the only option even if it meant that they had to be split into 10 MB chunks (1.5.5 supposedly raises the limit to 32 MB but I got an error attempting that)). One problem with that approach is that multiple GAE applications are used to implement parallel "search servers." Another is that it takes time and resources to read in the index from application files into RAM before search results can be computed. When an instance is killed all this work goes to waste and will have to be repeated on next search. When the number of groups was too small and therefore indexes too big, I was getting OutOfMemory errors just loading index data to RAM. Do you guys think it is a good idea to use application files to store index data? Since each "search server" runs the same code (and only accesses different application files), can it be implemented via a single (versioned?) GAE application? (I will run out of applications when adding more search servers, and it will become more costly to run the search engine). http://ecarrel.com -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-java@googlegroups.com. To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.