[google-appengine] Re: how best to represent a directed graph w/ 25K nodes? (for path planning)

Ross Ridge Mon, 29 Dec 2008 00:56:24 -0800

Amar Pai wrote:
> So if I shard my graph into 3 parts, my main class can store them as
> class variables and they'll persist across requests?  That would be
> great.  But will I run into quota problems holding 3M data in memory?
> (Assuming low overhead otherwise, and only 100ish requests per day)


Yes and no.  The total amount of memory your app can use appears to be
pretty high, and while there appears to be limit on how big a Python
object can be, your dict of dicts would be actually made up of lots of
little objects.  The problem is that in order to unpickle your dict of
dict you'll need to construct a single 3M string object containing the
pickled data that might very well result in a MemoryError.

Instead of using pickle you can use plain Python to store your dict.
Write a Python program to generate a file that looks like:

    S1="..."
    S2="..."
    ...
    G = { ((0,0), (4,2)): S1,
             ((0,0), (4,7)): S2,
             ((0,0), (4,7)): S1,
             ((0,0), (4,7)): S3,
             ...
     }

There's two thing to note about this example.  One is that it avoids
duplicate strings being stored in the dict by using the S# variables.
This is something that pickle would handle for you, but you would have
to do yourself if you want to go this route. Otherwise a new string
object would be created for each element of the dict.  The second is
that it's just a dict, not a dict of dict.  The two seperate
coordinate keys are merged into one key that's a tuple of two
coordinates.  So you'd look up G[(pt1, pt2)] instead of G[pt1][pt2].

If generating the dict this way results in a file bigger then 1M then
you'll need to split the dict, generating seperate Python files and
then join them up again.  Something like:

     import graph1, graph2, graph3, graph4
     G = {}
     G.update(graph1.G)
     G.update(graph2.G)
     G.update(graph3.G)
     G.update(graph4.G)

> Also, if I did it that way, is there a good way to export the sharded
> graph data  to BigTable for backup/inspection purposes?

Using a single key look up, you could store the graph in the
datastore, but I don't see the point.  You can inspect the Python dict
much easier and faster and you'll have a backup of your data on your
computer and anywhere else you want to copy it.  The only way to
change the graph is to upload a new version of the app.  If you try to
change the graph during a request, the change to the cached dict won't
be visible to other instances of your app running on a different
server.

> I guess I
> could schedule a cron job that hits webapp periodically to trigger
> export of x items, where x is however many you're allowed to write at
> a time.  Seems inelegant though.

Some sort of small batched upload like the bulk loader is the only way
to store bulk data in the datastore.

                                    Ross Ridge

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: how best to represent a directed graph w/ 25K nodes? (for path planning)

Reply via email to