Last question for a while (I think :-). Is there a straightforward way to purge what I've loaded so far? I've seen how to use the data loader to do deletes, I can certainly do that, if there's a quicker way to just purge my database from the dashboard that's probably better for my quota.
Thanks, Matt On Jun 1, 11:31 am, "Nick Johnson (Google)" <[email protected]> wrote: > Hi Matt, > > Yes, you're right - if you want to avoid the high CPU cap, and your > bulkload is sufficiently large, you'll need to spread it out over > multiple days. > > -Nick Johnson > > On Mon, Jun 1, 2009 at 8:27 AM, RainbowCrane <[email protected]> wrote: > > > Denormalizing is a good idea for performance of queries as well, > > actually. I can't think of a reason now that I'd want to query by > > nutrient, unless it's something like "search for low carb foods", > > "search for low fat foods", etc. In that case it may be easiest/best > > to pull out those specific fields into separate columns and index > > those columns. > > > I think I'll step back my approach slightly and forget about loading > > all nutrients (such as riboflavin), and only load the nutrients I care > > about (such as fat, carbs, etc). That would likely greatly reduce the > > number of rows in the nutrients table. I think there are 7K or so > > rows in the food table, and 500K in the nutrient:food link table. 80 > > rows of data per food is a lot. > > > A question on the CPU cap: even spreading this out over a longer time > > frame would still hit the CPU cap, wouldn't it, unless I spread it > > over multiple days? I'm assuming the CPU hrs to load 1 row are > > relatively constant, so 500K rows takes approx. the same total CPU hrs > > regardless of the speed/number of threads I use to load? > > > Thanks for the suggestion. > > > Matt > > > On Jun 1, 11:15 am, "Nick Johnson (Google)" <[email protected]> > > wrote: > >> Hi Matt, > > >> First, you might want to give some thought to denormalizing the data. > >> For example, I presume the list of nutrients for each food is fairly > >> small; you could merge the join table into the food entity, and > >> represent it as a ListProperty. Whether or not you can do this depends > >> on the sort of queries you expect to execute - for example, if the > >> join table has an amount, and you want to do queries like "every food > >> with at least 10% RDA Niacin", then this approach may not be best. If > >> you want specific advice, you could link us to the dataset and > >> describe the sort of queries you expect to make over it. > > >> As far as bulk loading goes, doing it slower so you don't go over your > >> CPU cap is probably the best bet. A few hours to load a dataset that > >> you'll use for an extended period isn't too bad a ratio, after all. > >> Your other option, as you point out, is to increase your cap just for > >> this. You can always reduce the cap or entirely disable billing later > >> if you wish. > > >> -Nick Johnson > > >> On Mon, Jun 1, 2009 at 3:50 AM, RainbowCrane <[email protected]> wrote: > > >> > Hi, > > >> > I've searched this and other app engine groups as well as general > >> > googling and I haven't found a solution, so posting here. I'm writing > >> > an app to provide a web service API on top of the free the USDA > >> > nutrition database, and it's a fairly large data set - a few of the > >> > tables have 500K rows due to the many-to-many relationships between > >> > food and nutrients. Any suggestions for a more efficient way to get > >> > this data into the database than the vanilla bulk loader? That's > >> > taking hours to complete and running up against my CPU limit. The > >> > data is separated into CSV files by table, and the relationships > >> > between tables in the CSVs are foreign key strings that make it > >> > straightforward to generate a db.Key for the relationship. > > >> > I know I can buy more CPU, that seems a little goofy since this is > >> > just the initial data load, and, unless my app becomes extremely > >> > popular, I'm likely not going to hit the limit again. If nothing > >> > else, I suppose I could split the data set and just do this over > >> > multiple days, though if I ever have to load the data again due to a > >> > schema change or something that's a serious annoyance. > > >> > I do want to use Python for this. It's been long enough since I've > >> > used Java that there'd be a learning curve to start back up with it, > >> > and I like Python. > > >> > Thanks, > >> > Matt > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
