What I have is an ImportContact model which is simply a table containing basic information about a person (name, email, address, etc). This table gets populated either from a standard website "Contact Us" form or by my client manually importing a CSV file. Now, once this ImportContact model has new records in it, I want to display a list of *potential* duplicates by various criteria. The "master" contacts model is simply called Contact and contains 30,000+ records. The Contact model does NOT include addresses, there is a separate Address model for this (again, with 30,000+ records).
The reason I can't have the database do the searching for me is because the criteria I want to search against to find potential duplicates doesn't have a direct one-to-one correlation to the database table fields. For example this "duplicate search" criteria works great using Cake: - Show potential duplicates based on field `email1` - Cake will generate a SELECT...IN query, listing ALL email addresses from the ImportContact model and returning all records from the Contact model containing anyone of the listed emails However, this example does NOT work: - Show potential duplicates based on fields `fn` and `ln` (a person's full name) - Since I have to CONCAT(`fn`, `ln`) AS `full_name`, the SELECT...IN syntax no longer works So what I did was query ALL records from both ImportContact and Contact models and store each of these result sets in their own arrays. Then I perform the search using PHP array functions and store the potential duplicates in a third array. This is working great, but it's just on the slow side (sometimes 3+ minutes). It's not just the array searching that's taking a while either, it's actually the database query retrieving 30,000+ records that's slowing things down as well. I'm just giving a brief overview here and can provide specific code if you like. Thanks for the quick replies, maybe I can get this thing to run more efficiently but I'm not quite sure how :-/ - Kevin On Mar 26, 4:14 pm, jonknee <[EMAIL PROTECTED]> wrote: > > I have a CakePHP CRM application with 30,000+ contact records. Each > > Contact hasMany Address records. One feature of my app is to allow my > > client to search for duplicate entries within the contacts table > > before importing new contacts. In short, I am querying the entire > > Contact table and LEFT JOINing with the Address table. I store the > > result set into an array and then use PHP's array functions to filter > > potential duplicates from a "pending contacts" table. > > What exactly are you doing? It can probably be handled pretty quickly > by the database. You shouldn't have to manually fish out dupes. > > I always thought the please wait while we search messages were either > a [poor] attempt to seem extra high-tech or a sign that they have a > really inefficient system. Google searches the whole web--not just a > day's commercial flights--and comes back in a flash. > > I have used a PHP app with a valid reason for it, it was a mailing > list program and it accomplished sending large amounts of email out by > batching (the page would keep reloading with incrementing GET > variables, such as start=0&num=100). It wouldn't work in your case. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Cake PHP" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/cake-php?hl=en -~----------~----~----~----~------~----~------~--~---
