On Mon, 4 Mar 2002, Dang Nguyen wrote:

> I'd like to know how to load files, such as MS-Word documents or PDF
> documents, in a mySQL database.  I've setup a blob type in a table, but
> where do I go from there?
> 
> The purpose of this is to store files uploaded from a web page and processed
> with Java servlets.  Then, the files should be retrievable (displayed or
> downloaded) back to a client browser.  My environment: Apache Web server
> 1.3.x on Solaris 2.8 with Java servlets environment.

MS-Word and PDF files are essentially binary files (very much like
images). I suggest leaving them as they are in a separate directory.

These MS-Word and PDF files may be converted to plain ASCII using filters
(mswordview, pdftotext). The resulting ASCII versions are then loaded into
a table with one column of type TEXT along with a VARCHAR pointer to the
filename. Create a fulltext index on the TEXT field.

A fulltext search for some keywords will return a sorted list of
filenames. These files can be returned to the client.

This scenario works well for me on a collection of some 15'000 HTML
documents (I used w3m as a filter to convert to ASCII). All conversions
are done with a simple shell script calling the filters.

Thomas Spahni


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to