Remind me to read my posts before pressing the send button just one more
time would you?

On Thu, 2003-05-29 at 09:05, Dirk Koopman wrote:
> On Tue, 2003-05-27 at 19:44, Nik Butler wrote:
> > Heres a problem for the perl ancients among you.....
> > 
> > One of our customers ( I say our since like the Borg, ive joined a
> > collective ) requires a regular deduplication of list information (
> > mostly CSV ) against a existing database (SQL Server 2k) .
> > 
> > now im fairly sure that this is exactly what Perl was designed for ...
> > however when searching for tools and advice on utilising those tools I
> > do tend to come up a little non plussed.
> 
> 
> The trouble is that people are not very consistent at writing their
> addresses, neither do they spell terribly exactly.  You can use one or
> more of the fuzzy match algorithms, some clever sorting, together with
> agrep and friends, but it will only go so far. At the end of the day
> there is no substitute for human intervention and eyeball pattern
> matching...
> 
> Unfortunately, to do this properly requires fuzzy logic and some
> intelligent human interaction. Basically, perl is your friend for doing
> the obvious, simple stuff - ie the addresses that are identical. Also
> for generating the 'possibles' you will need to scan.
> 
> The snail mailing list specialists keep this sort of software close to
> their chests because it is that which gives them the edge, viz: "clean"
> (deduped) lists, that pays top dollar.
> 
> Best of luck...
> 
> Dirk
-- 
Please Note: Some Quantum Physics Theories Suggest That When the
Consumer Is Not Directly Observing This Product, It May Cease to
Exist or Will Exist Only in a Vague and Undetermined State.



Reply via email to