[google-appengine] Re: Suggestions for large legacy DB

Nick Johnson (Google) Mon, 01 Jun 2009 08:16:16 -0700

Hi Matt,

First, you might want to give some thought to denormalizing the data.
For example, I presume the list of nutrients for each food is fairly
small; you could merge the join table into the food entity, and
represent it as a ListProperty. Whether or not you can do this depends
on the sort of queries you expect to execute - for example, if the
join table has an amount, and you want to do queries like "every food
with at least 10% RDA Niacin", then this approach may not be best. If
you want specific advice, you could link us to the dataset and
describe the sort of queries you expect to make over it.


As far as bulk loading goes, doing it slower so you don't go over your
CPU cap is probably the best bet. A few hours to load a dataset that
you'll use for an extended period isn't too bad a ratio, after all.
Your other option, as you point out, is to increase your cap just for
this. You can always reduce the cap or entirely disable billing later
if you wish.

-Nick Johnson

On Mon, Jun 1, 2009 at 3:50 AM, RainbowCrane <[email protected]> wrote:
>
> Hi,
>
> I've searched this and other app engine groups as well as general
> googling and I haven't found a solution, so posting here.  I'm writing
> an app to provide a web service API on top of the free the USDA
> nutrition database, and it's a fairly large data set - a few of the
> tables have 500K rows due to the many-to-many relationships between
> food and nutrients.  Any suggestions for a more efficient way to get
> this data into the database than the vanilla bulk loader?  That's
> taking hours to complete and running up against my CPU limit.  The
> data is separated into CSV files by table, and the relationships
> between tables in the CSVs are foreign key strings that make it
> straightforward to generate a db.Key for the relationship.
>
> I know I can buy more CPU, that seems a little goofy since this is
> just the initial data load, and, unless my app becomes extremely
> popular, I'm likely not going to hit the limit again.  If nothing
> else, I suppose I could split the data set and just do this over
> multiple days, though if I ever have to load the data again due to a
> schema change or something that's a serious annoyance.
>
> I do want to use Python for this.  It's been long enough since I've
> used Java that there'd be a learning curve to start back up with it,
> and I like Python.
>
> Thanks,
> Matt
>
> 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Suggestions for large legacy DB

Reply via email to