John Napiorkowski schrieb:
----- Original Message ----
From: Mario Minati <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, March 2, 2007 10:42:29 AM
Subject: [Dbix-class] Maybe OT - How to create a result set based on 
'similarity'?

Hello @all,

I'm looking for a solution to find out if there is already some data in my dataset that is similar to a new entry.

Example:
Companynames
I would like to find out if there are already companies in my addressbook (DB) which are similar to a given name to avoid double entries.

How to measure similarity:
I'am thinking of the hammingdistance. That means the difference between Linux and Linus is 1 as there is one letter different. The distance between Linux and Lisa is 3 as there is one letter more and two are different.

Does anyone have an idea how to realize that?
Can one realize this with code running on the database (PL/SQL or something) or is there a way doing that with DBIx::Class (drawback: all data had to read before processing).

Thank you for any hint.

Greets,
Mario Minati

Mario,

Seems more like something you'd want to do in a search engine.  Postgresql has 
done some work in this area, you might want to check their site.  I think using 
SQL to do this would be prohibitive.  I can imagine building a SQL statement 
that would return all rows in a table where a given column had a value that was 
one or two different in the way you mentioned, but anything bigger that that 
and you'd end up with quite a large SQL statement.  I'd try do do this using 
some build in capabilities of the Database if I could.  If the dataset was 
small than doing it in perl would be easy as well, but you are going to 
generate lots of database traffic.  If that's not an issue (this job is running 
on a scheduler during low activity time) you could cache the resultset out to 
disk to avoid filling all your memory.

good luck!
--john
As I just answered Jason, I'll use the Postgres Addon for Levenshtein to solve my problem.

Thank you for your thoughs, they were helpfull.

Greets,
Mario

_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/[email protected]/

Reply via email to