Re: Complex Verity collection...

Pete Ruckelshaus Sun, 07 Oct 2007 06:43:33 -0700

Since I seem to be on a string of questions that don't seem to have any
answers from the list, I have found a solution for this problem.  Hopefully,
it will be useful to someone else, as it took me the better part of a day of
brainstorming to come up with this solution.

To recap, the problem is that I have two Verity collections, one for
document data, the other for database data.  If a document exists, it can
also have a record in the database, but a record in the database doesn't
necessarily have a document associated with it.  The problem is that there
are going to be duplicate results where a record exists in both
collections.  The solution (with code):

First, index a document:

<cfindex action="UPDATE" collection="tmp_doc_coll"
key="C:\Inetpub\wwwroot\lessoncenter\user_files\pruckelshaus\Unit 1 -
History of the Internet.doc" urlpath="/user_files/pruckelshaus/" custom1="1"
category="content" type="FILE">

Note that I set a value in "custom1" that is the PK of the documents table

Now, index a database query:

<cfquery name="getRecordForIndexing" datasource="#application.dsn#">
 SELECT  *
 FROM tblLessons
 WHERE intLessonID = 1
</cfquery>

<cfindex action="update" collection="lessonplan_coll" type="CUSTOM"
key="intLessonID" title="vchLessonTitle" query="getRecordForIndexing"
urlpath="/user_files/pruckelshaus/" custom1="intLessonID"
body="vchLessonTitle, vchLessonDescription, txtLessonContent"
category="content">

This is a test table, so there's only one record in it, with an ID of 1,
which is also the value for "custom1"

Now, search both collections:

<cfsearch collection="lessonplan_coll" name="search">
<cfsearch collection="tmp_doc_coll" name="search_doc">

Since it's possible for a search term to show up in the document collection
that doesn't show up in the database (and vice-versa), I created a list of
custom1 values from the document collection to make sure there were no
duplicate custom1 values in the final result:

<cfset containedList = valueList(search_doc.custom1)>

Now, using a UNION inside a Query of Queries, create a unified resultset
that doesn't have any duplicate custom1 values:

<cfquery name="qoqtest" dbtype="query">
 SELECT  [custom1], [category], [key], [score], [size], [summary], [title],
[type], [url]
 FROM search_doc
  UNION
 SELECT  [custom1], [category], [key], [score], [size], [summary], [title],
[type], [url]
 FROM search
 WHERE custom1 NOT IN (<cfqueryparam cfsqltype="CF_SQL_VARCHAR"
value="#containedList#" list="Yes" separator=",">)
</cfquery>
And that's it.  Keep in mind that this is test code and has hardcoded values
where there would normally be variables, and things like that, but this
solved my original problem.  From this resultset, I can create a result page
that will link to the document's database page, which will also contain a
link to the physical document itself.

I hope this helps somebody out there, and I hope the code wasn't too
difficult to follow.  Considering the tortured path that this issue had me
traveling down, I'm surprised at how simple the end solution actually is.

Thanks

Pete

On 10/5/07, Pete Ruckelshaus <[EMAIL PROTECTED]> wrote:

> I'm building a document management application where I want to be able to
> index BOTH binary document data (word, excel, powerpoint, rtf, PDF,
> etc.) AND database data in the same collection; simply, a user can upload a
> document, and when they upload the document, they can also save a variety of
> data (title, author, keywords, etc.) to a related database table.  However,
> not all of the records in the database will have corresponding files (all
> files will have database records).
>
> The issue is that when indexing database content, you generally use the PK
> for the table as the "key" field in the collection when you use an "update"
> action on the cfindex.  However, when documenting files, the key gets set as
> the physical file location that was indexed.  This means that if I were to
> add both document and database content to a collection, all of the document
> records would have duplicate records (the database portion).  I would like
> to avoid this.
>
> Has anyone dealt with this problem, and if so, how did you proceed?  What
> I am thinking of doing is actually having two collections, a "temp"
> collection that indexes ONLY the files, and the "real" collection that
> indexes the databse; when a new record is created, I create the DB record in
> the index from a DB resultset, then I index the file in the temp collection,
> then I retrieve that files' record from the temp collection and update the
> DB collection using the database primary key.  Any flaws in this approach?
>
> Thanks
>
> Pete
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Get the answers you are looking for on the ColdFusion Labs
Forum direct from active programmers and developers.
http://www.adobe.com/cfusion/webforums/forum/categories.cfm?forumid-72&catid=648

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:290476
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

Re: Complex Verity collection...

Reply via email to