Hi Sarah

I don't know about finding an 80% match, but I use DigDB
(www.digdb.com/) for a lot of spreadsheet work with Excel.  It has a
function to filter uniques - Go to Advanced Filter, Filter Uniques, then
select the fields you want to match on.  There are also functions to
split fields by delimiter or number of characters, and a lot of other
functions that I couldn't do without.  It's not freeware - but you get a
free 15 day trial.  I suspect you will have to try a variety of methods
to gradually sort out your duplicates as it sounds as if your data is
not very consistent.  You might need to work on separating key data out
into different columns first.  Good luck!

Kate Muirhead
Collections Manager - Documentation
Canterbury Museum 
Rolleston Avenue, Christchurch 8013 
NEW ZEALAND 
Telephone 64 3 366 5000 
Direct Dial 64 3 366 9429 extn 875 
Facsimile 64 3 366 5622 
Email kmuirhead at canterburymuseum.com 
www.canterburymuseum.com 

The contents of this email are confidential.  If you have received this
communication by mistake, please advise the sender immediately and
delete the message and any attachments.  The views expressed in this
email are not necessarily the views of Canterbury Museum.



-----Original Message-----
From: mcn-l-bounces at mcn.edu [mailto:[email protected]] On Behalf Of
sarah johnson
Sent: Saturday, 12 May 2007 7:37 a.m.
Subject: [MCN-L] database sorting questions


I recently became involved with a project attempting to sort through a 
rather large (50,000 or so) collection of multimedia assets (videotapes,

audiotapes, cds, dvds), looking for duplicate copies.  I would like to
at 
least get a start on this programmatically, using the database they were

recorded in (the database has been exported to an excel file).

However, I am having some trouble coming up with a method to query and
sort 
the data.

Each piece was logged separately, with no mention of whether it was
actually 
a clone or viewing copy of a previously entered piece.

For the most part, all of the information on the label was typed into a 
single 'description' field.  Runtimes and dates are both in separate
fields, 
as is title (although the separate title field was somewhat rarely
used).  
Unfortunately, the labels did not always read exactly alike even if the 
videos were, nor did the layout of the entries always match.  Runtimes
are 
also often off by a few seconds.

I really need a method of automatically pulling up all assets with about
an 
80% match between fields.  Is there a way to do this?  Or am I
approaching 
this problem all wrong?  Any help at all would be greatly, greatly 
appreciated!

Sarah Johnson
sarah at dvs.com

_________________________________________________________________
See what you're getting into...before you go there 
http://newlivehotmail.com/?ocid=TXT_TAGHM_migration_HM_viral_preview_050
7


Reply via email to