Re: Best way to add a huge dataset?

Mike Sun, 14 Mar 2010 17:08:02 -0700

Are you saying that Taps gets slower at a greater than linear rate on
the amount of data? How slow are we talking about for 500 megs?


So in the second paragraph, are you saying that you broke your
database into csv files, and committed them sequentially into your git
and pushed your git up to Heroku one after another until all your data
was entered? What size chunks did you use? I see that my slug size is
currently 6.5/20 MB, but I recall that's a soft limit. However, I
still assume there must be an upper hard limit on the size of source
on Heroku.

On Mar 12, 8:42 pm, Jim Gilliam <[email protected]> wrote:
> I just wanted to let you know that with taps, as soon as you get above about
> 500 megs, it gets REALLLLLLLY slow.  I don't see how you could ever import a
> gigabyte of data with taps.   It would be really helpful if taps could do
> one table at a time.
>
> I tried to hack up my own one-table-at-a-time version with YamlDb, but then
> I ran into severe utf8 issues that I could never work through.   I ended up
> removing enough data from the initial database to get it small enough to
> import with taps, then I added various csv files with the removed data to
> the apps source and wrote a script to import them bit by bit.
>
> On Fri, Mar 12, 2010 at 4:46 PM, Mike <[email protected]> wrote:
> > Those are really good ideas. Would there be any way you can think of
> > to push data back up to the server?
>
> > On Mar 12, 9:43 am, Daniele <[email protected]> wrote:
> > > For the import part if it has not to be "atomic" you could create
> > > another app and use it to feed via a web service the data on the
> > > production database while the main app is live.
>
> > > To download only the main dataset create a boundle, animate it on a
> > > new application, delete the huge data with a migration or with a sql
> > > query in the console and the pull the database locally.
>
> > > By the way I'm investigating about how to work with big db too. Coming
> > > from a standard server world I love Heroku but some steps could be a
> > > bit less flexible. Nothing is perfect :)
>
> > > On 12 Mar, 07:29, Mike <[email protected]> wrote:
>
> > > > I'm going to be adding a number of discrete, but enormous (maybe many
> > > > gigs each), datasets to my Heroku app's database.  In many ways, I'm
> > > > in a similar situation faced by Tobin in another current post, but
> > > > with a different question:
> >http://groups.google.com/group/heroku/browse_thread/thread/141c3ef84b...
>
> > > > Right now I still haven't merged the datasets into my database yet.
> > > > What's the best way for me to approach this?
>
> > > > The lack of ability to push individual tables with taps suggests to me
> > > > I'm going to want to do this probably as a one shot deal, rather than
> > > > doing each dataset sequentially and testing that one before proceeding
> > > > to the next.  I'm thinking about doing a db:pull to get the current
> > > > state of my database, and then shutting down my application in
> > > > maintenance mode, running a local merge of the datasets (maybe taking
> > > > days I'm guessing just to process the enormous things), doing some
> > > > exhaustive local testing on the result, and then doing a push back to
> > > > Heroku (maybe taking days again), before reactivating my app.  Because
> > > > of their massive size, it seems like after I've done one, doing any
> > > > further db:pulls is going to be basically impossible.  Just the idea
> > > > of possibly having made a mistake in merging the datasets that I don't
> > > > catch until after it's been pushed to the site gives me the shivers.
> > > > Overall, I wonder if there could be a better way that I'm overlooking.
>
> > > > One possible alternative I thought of is would it be possible to do
> > > > something involving creating a local bundle from my database using
> > > > YamlDB?  But then I'm not sure how to get the bundle back onto the
> > > > server and then to restore from it?  The documentation on Heroku
> > > > doesn't seem to really talk about that possibility.
>
> > > > Also, in my case this data is integral to the application, so I'm not
> > > > going to be able to split it up into a separate Heroku application
> > > > like in Tobin's case.  Is there going to be any practical way for me
> > > > to be pulling just the non-dataset data from the server in order to
> > > > use on a development machine?
>
> > > > Does anyone have any ideas on how they would approach this problem?
> > > > If so, I'd be filled with gratitude.
>
> > > > Mike
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Heroku" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]<heroku%[email protected]>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/heroku?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Heroku" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.

Re: Best way to add a huge dataset?

Reply via email to