Hi, We have binary documents that we want to index (e.g. Word, Excel, Powerpoint, PDF, etc.) and we know we can index them with verity using the cfindex type=file attribute. However, we want to index their content along with other content from our database. So, we are looking at extracting the indexable text from these binary documents, putting into a query along with the other text we want to index, and then indexing THAT query with CFINDEX.
Does anyone know how to best convert these binary documents to text using ColdFusion, and if so, is it requiring a third party tool, etc. or can it be done with native CF tags/functions? FYI - I have tried using CFFILE action=readbinary and then converting that value using toString(). This gives me some text, although I also get alot of junk along with it (ascii characters that are not readable, etc.) which I assume is part of the file definition for Word, or whatever binary document I'm converting. I'm not sure if I can include this 'junk' in my index without harming the searchability of it, nor am I sure if I'm getting ALL of the text, ALL the time, so I would prefer to be able to extract JUST the text so it can be indexed properly. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:260526 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

