On Mon, Jul 27, 2009 at 3:04 PM, DyingToLearn<[email protected]> wrote: > I am trying to figure out the best way to handle full-text indexing on > my app. The docs (http://docs.heroku.com/full-text-indexing) provide a > great starting point, but I would like some more details. > > As far as I know, I have a "small dataset:" > * 3 tables > * < 3000 rows in each table > * total size of the index on my local machine is 4.4MB (using a copy > of my production data)
Yes, I'd consider that small enough for using Ferret, although what really matters is the time it takes to build your indexes as discussed below. > So Heroku recommends acts_as_ferret with the caveat that it "will > require rebuilding the indexes every time a new app server is > launched." > > I think this is the point that I need clarified. I know an app server > is launched when I deploy changes (git push heroku master), but when > else is a new app server launched? Those docs were written prior to our invention of the term "dyno." "App server" as it's used here and "dyno" are synonymous - a single web process running on Heroku, serving your app. So it will be restarted: - on git push - whenever your app idles out and is later awakened - whenever you type "heroku restart" - once every 12 - 24 hours (we cycle dynos automatically to help manage memory size) The first three of these will be restarting all dynos on the app, the last one only one at a time (the automatic cycling is staggered out). > On my development machine, my indexes take about 20-40 seconds to > rebuild. On Heroku they seem to take 40-60 seconds. This is obviously > too long for a normal request because Heroku times out after 30 > seconds or so (with an error screen I can't customize). (And of course > I never want to keep an actual user waiting 60 seconds for a normal > request.) 40 - 60 seconds probably pushing the edge of what you can do using Ferret. A build time that long will make your dyno start/restart process less agile, which limits the system's ability to manage your app effectively. So unless you can find a way to get that time down (maybe tweak some options or index less data?) then you'll probably need to look at switching to a full-sized text indexing solution like Solr. > (Side question, if I > am using multiple Dynos, will they share the index? I don't see how > they could since the index is stored in the temporary directory. This > implies I need to rebuild my indexes separately for each Dyno.) RIght, each dyno has its own index. When the files are small and your build times are (say) less than 10 seconds, this is fine. Your app is at the point where that ceases to work very well. > Idea 1. Add an initializer which starts a delayed job to rebuild the > indexs (if required). > Idea 2. Add a timeout (set to a shorter time than Heroku's timeout) in > my search controller which gives the user a message saying that the > indexes are being rebuilt, the page will refresh and show him results > within 30 seconds. I think both of these together would be a sweet workaround if you want to avoid the extra work of going to a heavier-duty solution. > Idea 3. None of this will work, I need to user Solr. Yeah, depending on what your users are willing to put up with - whether it is a commercial app, for example - Solr may be in your future. But hopefully the info I've provided here will help you make a decision about whether you stretch Ferret a little further. Adam --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/heroku?hl=en -~----------~----~----~----~------~----~------~--~---
