Hi Dick, you may need to turn to using some external tools.
something similar to this was discussed before and some tools suggested. See: http://www.ruby-forum.com/topic/103374 assuming the text is stored ASCII single byte, you could fall back on the "strings" command as a last resort. It should be installed already on modern GNU/Linux distros. Try cygwin for windows. It reads in any data and outputs all "printable character sequences". John. On Wed, 2007-04-25 at 19:14 +0200, Dick Monahan wrote: > The documents we want to index come in many formats; e.g., HTML, PDF, > RTF, Word, Excel, etc., etc., etc. I've been searching to find parsers > that will translate each of these formats to indexable text, but have > had little success. Any help will be appreciated. > -- http://johnleach.co.uk _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

