If you want to stay in Cake-land as much as possible try making use of
the joins key in your options for find().
Nate wrote a nice article showing how to get going:
http://bakery.cakephp.org/articles/view/quick-tip-doing-ad-hoc-joins-in-model-find

General advice (apart from your big search function) is:
- Use Containable everywhere :)
- Paginate any of these big datasets. Well, any dataset with a
reasonable expectation of ever returning more than 25 records. So that
would be most models. When you are doing that tricky joining query
using the paginator might be tricky byt don't e lazy. Make your own
paginator for that action. You can probably skip the ordering magic
from paginator but simple paging of the results can be done quickly
the bad old way.
- Depending on the data you might get good results caching it. I
usually don't but that is the fault of the type of data I most often
deal with. If you can get away with the results sometimes being a few
minutes "out of date" then caching could be for you.
- I have also done my own per-request caching when the same data is
likely to be found multiple times (yes, special cases). I got a huge
leap in speed compared to the built-in query caching byt doing a
special trick for my special situation.

I use a combination of these techniques to crawl through datasets that
are 50 times bigger (and more) than your 20'000 and still going like a
rocket. But I am no genius. I was in your shoes last year when we went
from 50 to 400'000 unique records in 3 days... man, was that a fun
summer? :) I learned a lot by transferring the live data to my dev-
setup and (very quickly) optimizing sloppy queries and limiting
results the right way and so on.


Keep asking specific questions if you have them. I'd be happy to help.
If for nothing else...at least we can help show that Cake is not the
slow-coach among php frameworks.

/Martin


On Jun 16, 12:42 pm, "Adam Royle" <[email protected]> wrote:
> It's possible to use a combination of cake's magic and raw sql together, 
> which I use occasionally when I need something a bit out of the ordinary but 
> still want to use the cake methods (to make use of model binding, etc).
>
> eg.
>
> function getHomepageArtists() {
>     $query = array(
>         'fields' => array('id', 'name', 'slug'),
>         'conditions' => 'Artist.id IN (SELECT artists_events.artist_id FROM 
> artists_images INNER JOIN artists_events ON artists_events.artist_id = 
> artists_images.artist_id)',
>         'contain' => array('Image'),
>         'order' => 'popularity, RAND()',
>     );    
>     return $this->find('all', $query);
>
> }
>
> Hope that helps,
> Adam
>
>
>
> ----- Original Message -----
>   From: Femi Taiwo
>   To: [email protected]
>   Sent: Tuesday, June 16, 2009 8:32 PM
>   Subject: Re: Problem with Large Datasets
>
>   Thanks for replying.
>
>   Actually, I've got everything properly indexed and closely followed 
> standards on that. I honestly just wanted to avoid writing a custom query 
> (lazy me). Was only hoping that there was a way to avoid custom queries. Will 
> give that a try then.
>
>   Thanks!
>
>   Femi
>
>   Adam Royle wrote:
>     Hi Femi,
>
>     You should really just use one query for that with a number of joins and 
> let the database do all the work. My advice is to get someone familiar with 
> sql and construct the raw query and get it running at a decent speed inside 
> the mysql console, and then find a way to get that sql into cake.
>
>     I'm guessing you don't have any indexes on your tables either, so best to 
> get that same person to look over them. 20,000 records is not a lot and mysql 
> can handle that easily if you know what you are doing.
>
>     Best of luck!
>
>     Cheers,
>     Adam
>
>     ----- Original Message -----
>       From: Femi Taiwo
>       To: Cake PHP
>       Sent: Tuesday, June 16, 2009 7:05 PM
>       Subject: Problem with Large Datasets
>
>       Hi all,
>
>       I'm having a scalability problem, and urgently need to refactor. The 
> project is just 2 months old and already has over 20,000 users.
>       How do you handle passing an array with 5000+ values as a condition
>       e.g. $conditions = array('User.id'=>$bigArray);
>       As the records grow bigger, it takes longer.
>
>       I'm not looking to paginate because what happens is that an alert is 
> sent to the users in the final result set. Browsing through the result set is 
> NOT necessary.
>
>       Models?
>       I have a User model  with the following key relationships
>
>           hasMany UserDegree
>           hasMany Certification
>           hasMany Skill
>           hasMany Language
>           belongsTo City
>
>       The User model has a few columns as well - dob, marital status  and 
> gender.
>
>       Herein lies the problem - a user can set different criteria for 
> searching through the entire user database using any of the attributes above.
>       She can say for example, I want users in Kansas, aged between 22 and 
> 27, who have degrees in either Computer Science, Mathematics or Physics, has 
> Marketing skills
>       and can speak a bit of  French.
>
>       What I do right now is select from the users table, those aged between 
> 22 and 27, then amongst these ones, those with the relevant degrees => then 
> get from this set, the ones that have marketing skills. So I keep filtering 
> using their user_ids. and passing them to the relevant model using conditions 
> like $conditions = array('UserDegree.user_id'=>$bigArray);
>
>       Initially, this worked well with a few thousand records. But now with 
> 20,000. Takes  3-5 seconds and is really cpu intensive and hard on the MySQL 
> server.
>
>       What can I do?
>
>       Thanks!!
>
>       Femi
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"CakePHP" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to