I have full-text indexed all of O'Reilly eBooks (1,600 of them) using
my own search engine. You can see how the search works if you have an
Android tablet and install (free app) eCarrel (O'Reilly bookstore). To
make searching manageable in the context of GAE I have originally
partitioned all the books into 4 groups, each with its own index. That
way searches can be performed in parallel (merging results when done),
individual (per book group) indexes are smaller, and w/i group search
is faster.

An index for 400 books is about 90 MB in size. To implement the search
engine on GAE
I would dedicate 4 applications to the task (eg. search1.appspot.com,
through search4....).
Each application would run exactly the same code, but would have
different "application files"
containing index data.  (I wasn't sure if the index data should be
stored in DataStore entities,
Blobstore blobs, or application files; at the time the SE was first
implemented it seemed
that application files was the only option even if it meant that they
had to be split into
10 MB chunks (1.5.5 supposedly raises the limit to 32 MB but I got an
error attempting that)).

One problem with that approach is that multiple GAE applications are
used to implement
parallel "search servers." Another is that it takes time and resources
to read in the index
from application files into RAM before search results can be computed.
When an instance
is killed all this work goes to waste and will have to be repeated on
next search.
When the number of groups was too small and therefore indexes too big,
I was getting OutOfMemory errors just loading index data to RAM.

Do you guys think it is a good idea to use application files to store
index data?

Since each "search server" runs the same code (and only accesses
different application
files), can it be implemented via a single (versioned?) GAE
application? (I will run out of applications when adding more search
servers, and it will become more costly to run
the search engine).

http://ecarrel.com

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to