----- Original Message ----
From: Mario Minati <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, March 2, 2007 10:42:29 AM
Subject: [Dbix-class] Maybe OT - How to create a result set based on 
'similarity'?

Hello @all,

I'm looking for a solution to find out if there is already some data in 
my dataset that is similar to a new entry.

Example:
Companynames
I would like to find out if there are already companies in my 
addressbook (DB) which are similar to a given name to avoid double entries.

How to measure similarity:
I'am thinking of the hammingdistance. That means the difference between 
Linux and Linus is 1 as there is one letter different. The distance 
between Linux and Lisa is 3 as there is one letter more and two are 
different.

Does anyone have an idea how to realize that?
Can one realize this with code running on the database (PL/SQL or 
something) or is there a way doing that with DBIx::Class (drawback: all 
data had to read before processing).

Thank you for any hint.

Greets,
Mario Minati

Mario,

Seems more like something you'd want to do in a search engine.  Postgresql has 
done some work in this area, you might want to check their site.  I think using 
SQL to do this would be prohibitive.  I can imagine building a SQL statement 
that would return all rows in a table where a given column had a value that was 
one or two different in the way you mentioned, but anything bigger that that 
and you'd end up with quite a large SQL statement.  I'd try do do this using 
some build in capabilities of the Database if I could.  If the dataset was 
small than doing it in perl would be easy as well, but you are going to 
generate lots of database traffic.  If that's not an issue (this job is running 
on a scheduler during low activity time) you could cache the resultset out to 
disk to avoid filling all your memory.

good luck!
--john

_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/[email protected]/





 
____________________________________________________________________________________
TV dinner still cooling? 
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/

_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/[email protected]/

Reply via email to