What I have is an ImportContact model which is simply a table
containing basic information about a person (name, email, address,
etc). This table gets populated either from a standard website
"Contact Us" form or by my client manually importing a CSV file. Now,
once this ImportContact model has new records in it, I want to display
a list of *potential* duplicates by various criteria. The "master"
contacts model is simply called Contact and contains 30,000+ records.
The Contact model does NOT include addresses, there is a separate
Address model for this (again, with 30,000+ records).

The reason I can't have the database do the searching for me is
because the criteria I want to search against to find potential
duplicates doesn't have a direct one-to-one correlation to the
database table fields.

For example this "duplicate search" criteria works great using Cake:

- Show potential duplicates based on field `email1`
- Cake will generate a SELECT...IN query, listing ALL email addresses
from the ImportContact model and returning all records from the
Contact model containing anyone of the listed emails

However, this example does NOT work:

- Show potential duplicates based on fields `fn` and `ln` (a person's
full name)
- Since I have to CONCAT(`fn`, `ln`) AS `full_name`, the SELECT...IN
syntax no longer works

So what I did was query ALL records from both ImportContact and
Contact models and store each of these result sets in their own
arrays. Then I perform the search using PHP array functions and store
the potential duplicates in a third array. This is working great, but
it's just on the slow side (sometimes 3+ minutes). It's not just the
array searching that's taking a while either, it's actually the
database query retrieving 30,000+ records that's slowing things down
as well.

I'm just giving a brief overview here and can provide specific code if
you like. Thanks for the quick replies, maybe I can get this thing to
run more efficiently but I'm not quite sure how :-/

- Kevin


On Mar 26, 4:14 pm, jonknee <[EMAIL PROTECTED]> wrote:
> > I have a CakePHP CRM application with 30,000+ contact records. Each
> > Contact hasMany Address records. One feature of my app is to allow my
> > client to search for duplicate entries within the contacts table
> > before importing new contacts. In short, I am querying the entire
> > Contact table and LEFT JOINing with the Address table. I store the
> > result set into an array and then use PHP's array functions to filter
> > potential duplicates from a "pending contacts" table.
>
> What exactly are you doing? It can probably be handled pretty quickly
> by the database. You shouldn't have to manually fish out dupes.
>
> I always thought the please wait while we search messages were either
> a [poor] attempt to seem extra high-tech or a sign that they have a
> really inefficient system. Google searches the whole web--not just a
> day's commercial flights--and comes back in a flash.
>
> I have used a PHP app with a valid reason for it, it was a mailing
> list program and it accomplished sending large amounts of email out by
> batching (the page would keep reloading with incrementing GET
> variables, such as start=0&num=100). It wouldn't work in your case.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to