Ben van Staveren wrote:

> Well, using mod_proxy to do some reverse proxying would work, but users
> would still be able to more or less 'browse' the document tree if they
> know where to look. No real way around that one ;)

Actually, it's not that hard. Just have the public server proxy the requests to
the doc server and set up the doc server to only allow requests from the public
server.

> I suggest setting up MySQL and storing that information in there --

For information that just needs to be searched (indexed) I usually just use
something like swish-e (and there's lots of fun Swishe modules on CPAN) for
that. Then you can update the index independent of the server and you don't have
to deal with another daemon like MySQL if you don't want to. MySQL is really
fast for primary key lookups, but I've been amazed at how well swish-e performs
for full text searches. Plus it's much more configurable than MySQL in how it
indexes content.

> depending on the type of documents you search through, you could
> potentially even put the documents in the database as well, although
> that's not really a 'good' way of doing it.

Avoid putting files in the database if you can. When you do that you loose all
of the niceties that OSes give you when working with files. No more ls, grep,
find, cat, more, less, etc. And try opening a PDF file when it's stored in a
database. You have to programmatically extract it. Ick.

> So in the end, if you store
> those indexes in the database, you get the shared accessibility, and you
> can always use a cronjob to update it.

If you need multiple machines to access swish-e document indexes you can have a
couple of options.

* Store the index over a shared mount (NFS?)
* Use scp or rsync to periodically sync the indexes (works well if combined with
a cronjob that periodically creates the index)
* Use the experimental swished which is a swish-e server (or multiple servers)
to handle really large sets of documents.

Btw, how many documents are we talking about?

-- 
Michael Peters
Developer
Plus Three, LP

Reply via email to