Dave Phillips wrote:
> Hi,
>
> We have binary documents that we want to index (e.g. Word, Excel, Powerpoint, 
> PDF, etc.) and we know we can index them with verity using the cfindex 
> type=file attribute.  However, we want to index their content along with 
> other content from our database.  So, we are looking at extracting the 
> indexable text from these binary documents, putting into a query along with 
> the other text we want to index, and then indexing THAT query with CFINDEX.
>
> Does anyone know how to best convert these binary documents to text using 
> ColdFusion, and if so, is it requiring a third party tool, etc. or can it be 
> done with native CF tags/functions?
>
>   

You can use some Java libraries to accomplish this.  Look at Jakarta POI 
(http://jakarta.apache.org/poi/) to access Office files, and JPedal 
(http://www.jpedal.org/) or Xpdf (http://www.foolabs.com/xpdf/) for 
reading PDF files.

-Ryan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:260536
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to