I have a large number of pages that are stored in my database on my app, that are only accessible via the search engine under normal circumstances. This is a perfect ordinary use case for making a sitemap, which is what I've done.
I have a few million of these pages, so I dynamically generate a sitemap index that in turn points to individual sitemaps. This was easy, because I could just pull out the id of the last page, and divide that by 50,000 (the max allowed by the sitemap protocol) to determine the number of sitemaps I would link to. Then, because each page has a url slug in the database that has to be read, each sitemap would be generated by calling find_each with the id limited to be within the range of the 50,000 starting from 50,000 multiplied by the sitemap number. While a simple idea in theory, this produces an enormous amount of database churn in practice. The search engines would sometimes spike the database for up to 6 seconds with a single query, even with the find_each. I have since reduced the 50,000 down to 10,000, with the result that I seem to have a constant database churn from the search engines at about 500ms to 1s. This doesn't really strike me as a very good situation. My first instinct to handle this is that I need to somehow cache the sitemaps, maybe pre-generate them and update them once a week or something like that, to minimize the strain on the database. However, this seems fairly difficult on Heroku, since we don't have any local storage on the webservers that can hold the generated (somewhat large) sitemap files. I suppose I could rig something up to upload the weekly sitemap cache to Amazon S3 or something like that, but I've never heard of anyone storing their sitemap off site before. Will this present any type of an issue to the search engines, having to step out of a domain to get the domain's sitemap? Overall, that seems like a somewhat awkward arrangement as well. What do people think? How would you handle this? -- You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
