Dave Phillips wrote: > Hi, > > We have binary documents that we want to index (e.g. Word, Excel, Powerpoint, > PDF, etc.) and we know we can index them with verity using the cfindex > type=file attribute. However, we want to index their content along with > other content from our database. So, we are looking at extracting the > indexable text from these binary documents, putting into a query along with > the other text we want to index, and then indexing THAT query with CFINDEX. > > Does anyone know how to best convert these binary documents to text using > ColdFusion, and if so, is it requiring a third party tool, etc. or can it be > done with native CF tags/functions? > >
You can use some Java libraries to accomplish this. Look at Jakarta POI (http://jakarta.apache.org/poi/) to access Office files, and JPedal (http://www.jpedal.org/) or Xpdf (http://www.foolabs.com/xpdf/) for reading PDF files. -Ryan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting, up-to-date ColdFusion information by your peers, delivered to your door four times a year. http://www.fusionauthority.com/quarterly Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:260536 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4