It sounds like what you want to do is probabilistic searching rather than
boolean 'yes/no' matches. In my experience these are not really built into
Excel or such like. I had started a project trying to implement this via VBA
and Macros in MS Access/Excel, but I think although possible this would
involve much development.

I have come across (but not used) an open-source system called Xapian
(http://www.xapian.org/features.php) which will allow you to index your
Excel documents/database in a sophisticated manner, but with lower
development requirements. The developers of Xapian are somehow connected to
a company called Lemur Consulting who produce a product called Bamboo which
is designed to mine data in academic collections. There is a free evaluation
of Bamboo Personal Edition - contact Lemur Consulting. You may be able to
use this or something like it to return percentage type results. An
interesting problem, I hope this offers some help.  

on 11 May 2007 20:37 sarah johnson wrote:

> I recently became involved with a project attempting to sort through
> a rather large (50,000 or so) collection of multimedia assets
> (videotapes, audiotapes, cds, dvds), looking for duplicate copies.  I
> would like to at least get a start on this programmatically, using
> the database they were recorded in (the database has been exported to
> an excel file).     
> 
> However, I am having some trouble coming up with a method to query
> and sort the data. 
> 
> Each piece was logged separately, with no mention of whether it was
> actually a clone or viewing copy of a previously entered piece. 
> 
> For the most part, all of the information on the label was typed into
> a single 'description' field.  Runtimes and dates are both in
> separate fields, as is title (although the separate title field was
> somewhat rarely used).   
> Unfortunately, the labels did not always read exactly alike even if
> the videos were, nor did the layout of the entries always match. 
> Runtimes are also often off by a few seconds.  
> 
> I really need a method of automatically pulling up all assets with
> about an 80% match between fields.  Is there a way to do this?  Or am
> I approaching this problem all wrong?  Any help at all would be
> greatly, greatly appreciated!   
> 
> Sarah Johnson
> sarah at dvs.com
> 
> _________________________________________________________________
> See what you're getting into.before you go there
> http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_preview_0507

  Eike Friedrich
  Hamilton Kerr Institute

  tel:   +44 (0)1223 832 040
  fax:   +44 (0)1223 837 595
  web:   www-hki.fitzmuseum.cam.ac.uk


Reply via email to