>The better approach is > to implement converters > that convert these formats to plain text, either a > String or a Reader. Then > you can use the same analyzer for documents in > different formats. >
Has anyone tried implimenting 3rd party open source utilities to do this? xpdf (www.foolabs.com/xpdf) converts pdf to text and catdoc (http://www.ice.ru/~vitus/catdoc/ver-0.9.html) converts ms word to text. Maybe these can be used to create the plain text for the index... I look forward to seeing PDF and Word indexing added to this solution. My Best; J __________________________________________________ Do You Yahoo!? Make a great connection at Yahoo! Personals. http://personals.yahoo.com
