It sounds like what you want to do is probabilistic searching rather than boolean 'yes/no' matches. In my experience these are not really built into Excel or such like. I had started a project trying to implement this via VBA and Macros in MS Access/Excel, but I think although possible this would involve much development.
I have come across (but not used) an open-source system called Xapian (http://www.xapian.org/features.php) which will allow you to index your Excel documents/database in a sophisticated manner, but with lower development requirements. The developers of Xapian are somehow connected to a company called Lemur Consulting who produce a product called Bamboo which is designed to mine data in academic collections. There is a free evaluation of Bamboo Personal Edition - contact Lemur Consulting. You may be able to use this or something like it to return percentage type results. An interesting problem, I hope this offers some help. on 11 May 2007 20:37 sarah johnson wrote: > I recently became involved with a project attempting to sort through > a rather large (50,000 or so) collection of multimedia assets > (videotapes, audiotapes, cds, dvds), looking for duplicate copies. I > would like to at least get a start on this programmatically, using > the database they were recorded in (the database has been exported to > an excel file). > > However, I am having some trouble coming up with a method to query > and sort the data. > > Each piece was logged separately, with no mention of whether it was > actually a clone or viewing copy of a previously entered piece. > > For the most part, all of the information on the label was typed into > a single 'description' field. Runtimes and dates are both in > separate fields, as is title (although the separate title field was > somewhat rarely used). > Unfortunately, the labels did not always read exactly alike even if > the videos were, nor did the layout of the entries always match. > Runtimes are also often off by a few seconds. > > I really need a method of automatically pulling up all assets with > about an 80% match between fields. Is there a way to do this? Or am > I approaching this problem all wrong? Any help at all would be > greatly, greatly appreciated! > > Sarah Johnson > sarah at dvs.com > > _________________________________________________________________ > See what you're getting into.before you go there > http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_preview_0507 Eike Friedrich Hamilton Kerr Institute tel: +44 (0)1223 832 040 fax: +44 (0)1223 837 595 web: www-hki.fitzmuseum.cam.ac.uk
