Let me clarify something (I am writing on a blackberry so I couldn't read your post and write at the sametime) that I mistakenly implied. I said "update your live data" and what I meant to say was this:
Copy your live data as you are merging it with the new dataset to the new tables you have created. So basically: * bring app down * push migrations for processing * bring app back online (shouldn't take but a minute) * push new dataset up while app is live. None of the current code base can see that table so should be good * load queue with records that need to be processed (all records) * start background job ** job looks in queue for a record to process ** job grabs record, merges with new dataset and saves into new table ** repeat until queue is down to aproximate number of records that are updated daily. * put app in maintenance mode * process remaining records as quickly as possible * ensure data integrity * push latest code * migrate new tables to final resting spot * come out of maintenance mode I think that is morwe clear than my last attempt :-) On 3/12/10, Carl Fyffe <[email protected]> wrote: > This is just an idea: > > Instead of bringing the data down, and turning your app off for > multiple days you could leave the app up and do all of the processing > on Heroku. You will want to create a branch from your current > production code, and in this branch you will create the migrations for > the new tables. Then you create a background job that goes through > your live data and updates it. You should probably have a table that > has a list of the data that needs to be updated. On first load this > will have every record in the db, and as your background job works > through the data it removes the record. The best part is, whenever > your live application makes a change you can have an after filter that > puts an entry back into the queue table for that data to be migrated > (if it isn't already in there because it hadn't been processed yet). > When the jobs get close to being completed ( say an hours worth of > work remains) then you take the system down, let the background job > complete, and do a check of the data. You will have all of the old > data still and the new data at the same time. Then you can push all of > the other changes to the code base and turn your system back on! > > The benefits are: > > * Your system stays live while the work is being done > * you don't have to worry about bringing data down and then back up > * your system stays live > * the update can take as long as you need to get it right > * you can watch the progress by keeping an eye on the queue > A you know all of the data will be migrated because of the queue > > The down sides: > * it will probably cost more because you should up your db size in > heroku to the max so users don't notice the impact > * it might take longer because you don't want it to go as fast as > possible because it might impact your live system > * lots of moving parts > > Just an idea. Good luck! > > > On 3/12/10, Mike <[email protected]> wrote: >> I'm going to be adding a number of discrete, but enormous (maybe many >> gigs each), datasets to my Heroku app's database. In many ways, I'm >> in a similar situation faced by Tobin in another current post, but >> with a different question: >> http://groups.google.com/group/heroku/browse_thread/thread/141c3ef84b22fc18 >> >> Right now I still haven't merged the datasets into my database yet. >> What's the best way for me to approach this? >> >> The lack of ability to push individual tables with taps suggests to me >> I'm going to want to do this probably as a one shot deal, rather than >> doing each dataset sequentially and testing that one before proceeding >> to the next. I'm thinking about doing a db:pull to get the current >> state of my database, and then shutting down my application in >> maintenance mode, running a local merge of the datasets (maybe taking >> days I'm guessing just to process the enormous things), doing some >> exhaustive local testing on the result, and then doing a push back to >> Heroku (maybe taking days again), before reactivating my app. Because >> of their massive size, it seems like after I've done one, doing any >> further db:pulls is going to be basically impossible. Just the idea >> of possibly having made a mistake in merging the datasets that I don't >> catch until after it's been pushed to the site gives me the shivers. >> Overall, I wonder if there could be a better way that I'm overlooking. >> >> One possible alternative I thought of is would it be possible to do >> something involving creating a local bundle from my database using >> YamlDB? But then I'm not sure how to get the bundle back onto the >> server and then to restore from it? The documentation on Heroku >> doesn't seem to really talk about that possibility. >> >> Also, in my case this data is integral to the application, so I'm not >> going to be able to split it up into a separate Heroku application >> like in Tobin's case. Is there going to be any practical way for me >> to be pulling just the non-dataset data from the server in order to >> use on a development machine? >> >> Does anyone have any ideas on how they would approach this problem? >> If so, I'd be filled with gratitude. >> >> Mike >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Heroku" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/heroku?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
