>-----Original Message-----
>From: Johan De Meersman [mailto:vegiv...@tuxera.be]
>Sent: Friday, April 29, 2011 5:56 AM
>To: Jerry Schwartz
>Cc: mysql mailing list
>Subject: Re: Join based upon LIKE
>
>
>----- Original Message -----
>> From: "Jerry Schwartz" <je...@gii.co.jp>
>>
>> [JS] This isn't the only place I have to deal with fuzzy data. :-(
>> Discretion prohibits further comment.
>
>Heh. What you *really* need, is a LART. Preferably one of the spiked variety.
>
[JS] Unless a LART is a demon of some kind, I don't know what it is.

>> A full-text index would work if I were only looking for one title at
>> a time, but I don't know if that would be a good idea if I have a list of
>> 10000 titles. That would pretty much require either 10000 separate queries
>> or a very, very long WHERE clause.
>
>Yes, unfortunately. You should see if you can introduce a form of data
>normalisation - say, shadow fields with corrected entries, or functionality 
>in
>the application that suggests correct entries based on what the user typed.
>
[JS] Except for obvious misspellings and non-ASCII characters, I do not have 
the freedom to muck with the text. If the data were created in-house, I could 
correct it on the way in; but it comes from myriad other companies.

>Or, if the money's there, you could have a look at Amazon Mechanical Turk 
>(yes,
>really) for cheap-ish data correction.
>
[JS] Again, I can't change the data. The titles are assigned by the 
publishers. Think what would happen if Amazon decided to "fix" the titles of 
books. "Ain't Misbehavin" would, at best, turn into "I am not misbehaving".

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: je...@gii.co.jp
Web site: www.the-infoshop.com



>--
>Bier met grenadyn
>Is als mosterd by den wyn
>Sy die't drinkt, is eene kwezel
>Hy die't drinkt, is ras een ezel




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to