Since I seem to be on a string of questions that don't seem to have any answers from the list, I have found a solution for this problem. Hopefully, it will be useful to someone else, as it took me the better part of a day of brainstorming to come up with this solution.
To recap, the problem is that I have two Verity collections, one for document data, the other for database data. If a document exists, it can also have a record in the database, but a record in the database doesn't necessarily have a document associated with it. The problem is that there are going to be duplicate results where a record exists in both collections. The solution (with code): First, index a document: <cfindex action="UPDATE" collection="tmp_doc_coll" key="C:\Inetpub\wwwroot\lessoncenter\user_files\pruckelshaus\Unit 1 - History of the Internet.doc" urlpath="/user_files/pruckelshaus/" custom1="1" category="content" type="FILE"> Note that I set a value in "custom1" that is the PK of the documents table Now, index a database query: <cfquery name="getRecordForIndexing" datasource="#application.dsn#"> SELECT * FROM tblLessons WHERE intLessonID = 1 </cfquery> <cfindex action="update" collection="lessonplan_coll" type="CUSTOM" key="intLessonID" title="vchLessonTitle" query="getRecordForIndexing" urlpath="/user_files/pruckelshaus/" custom1="intLessonID" body="vchLessonTitle, vchLessonDescription, txtLessonContent" category="content"> This is a test table, so there's only one record in it, with an ID of 1, which is also the value for "custom1" Now, search both collections: <cfsearch collection="lessonplan_coll" name="search"> <cfsearch collection="tmp_doc_coll" name="search_doc"> Since it's possible for a search term to show up in the document collection that doesn't show up in the database (and vice-versa), I created a list of custom1 values from the document collection to make sure there were no duplicate custom1 values in the final result: <cfset containedList = valueList(search_doc.custom1)> Now, using a UNION inside a Query of Queries, create a unified resultset that doesn't have any duplicate custom1 values: <cfquery name="qoqtest" dbtype="query"> SELECT [custom1], [category], [key], [score], [size], [summary], [title], [type], [url] FROM search_doc UNION SELECT [custom1], [category], [key], [score], [size], [summary], [title], [type], [url] FROM search WHERE custom1 NOT IN (<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#containedList#" list="Yes" separator=",">) </cfquery> And that's it. Keep in mind that this is test code and has hardcoded values where there would normally be variables, and things like that, but this solved my original problem. From this resultset, I can create a result page that will link to the document's database page, which will also contain a link to the physical document itself. I hope this helps somebody out there, and I hope the code wasn't too difficult to follow. Considering the tortured path that this issue had me traveling down, I'm surprised at how simple the end solution actually is. Thanks Pete On 10/5/07, Pete Ruckelshaus <[EMAIL PROTECTED]> wrote: > I'm building a document management application where I want to be able to > index BOTH binary document data (word, excel, powerpoint, rtf, PDF, > etc.) AND database data in the same collection; simply, a user can upload a > document, and when they upload the document, they can also save a variety of > data (title, author, keywords, etc.) to a related database table. However, > not all of the records in the database will have corresponding files (all > files will have database records). > > The issue is that when indexing database content, you generally use the PK > for the table as the "key" field in the collection when you use an "update" > action on the cfindex. However, when documenting files, the key gets set as > the physical file location that was indexed. This means that if I were to > add both document and database content to a collection, all of the document > records would have duplicate records (the database portion). I would like > to avoid this. > > Has anyone dealt with this problem, and if so, how did you proceed? What > I am thinking of doing is actually having two collections, a "temp" > collection that indexes ONLY the files, and the "real" collection that > indexes the databse; when a new record is created, I create the DB record in > the index from a DB resultset, then I index the file in the temp collection, > then I retrieve that files' record from the temp collection and update the > DB collection using the database primary key. Any flaws in this approach? > > Thanks > > Pete > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Get the answers you are looking for on the ColdFusion Labs Forum direct from active programmers and developers. http://www.adobe.com/cfusion/webforums/forum/categories.cfm?forumid-72&catid=648 Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:290476 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

