Nick,
First, thanks for the prompt reply (especially since my question is
essentially about how to keep my app engine bill down!).
So the names are serialized with each record? In my example above,
"int1" requires an additional 8 bytes * [number of records] for
storage? (Assuming UTF-16 here.) Wow. (If it's not obvious, I have an
application with a very large number of very small records, so fixed
per-record overhead is fairly costly.)
I suspected indices were the principle cost (and that's entirely
reasonable). But if you don't mind, I have a couple of more in-depth
questions.
First, background (this is closer to my actual application). Assume I
have a database of genome subsequences, where each record contains a
sequence, a position, and a chromosome number. My current schema looks
like this:
class Record(db.Model):
seq = db.StringProperty()
location = db.IntegerProperty(indexed=True)
chromosome = db.IntegerProperty(indexed=True)
plusminus = db.BooleanProperty(indexed=True)
Now, I query on location and chromosome, so both of those are
currently indexed. But chromosome is only one of, say, five values.
So my question is, are indices equivalent--in terms of space--to a
table of the property being indexed and the key of the record?
To expand, is the above (current) schema less space efficient than
creating five separate tables, implicitly encoding the chromosome
value in the table name?
e.g.:
class RecordChromosome1(db.Model):
seq = db.StringProperty()
location = db.IntegerProperty()
In fact, it's entirely feasible to manually "unroll" (like loop
unrolling, I guess?) the query combinatorics of the chromosome and
plusminus properties and create ten different tables, wherein each
record contains only the sequence and the location (but not the
plusminus and chromosome properties, which are now implicitly encoded
in the record's kind).
Would doing this save space by obviating the indices for "plusminus"
and "location"?
And, while I'm at it, is the kind name also encoded in each record
instance, or only the property names?
Thanks again,
Calvin
On Jul 27, 7:12 am, "Nick Johnson (Google)" <[email protected]>
wrote:
> Hi Calvin,
>
> On Sun, Jul 26, 2009 at 6:36 PM, Calvin Slayden
> <[email protected]>wrote:
>
>
>
>
>
>
>
> > Hello,
>
> > After noticing my app using far more of the datastore quota than I
> > expected, I conducted a little experiment with a toy app. The toy
> > creates records of the format:
>
> > class Record(db.Model):
> > int1 = db.IntegerProperty()
> > # ... repeated three more times
>
> > According to the App Engine documentation, IntegerProperty types in
> > the datastore are stored as 64-bit integers, so each record consists
> > of 8 * 4 = 32 bytes of "payload", plus overhead (key size * number of
> > records, etc). I was hoping, with this experiment, to gain some
> > insight into what that overhead costs.
>
> > I uploaded 141,725 (sorry it's not a round number) of these records
> > and waited a day for my quota to update; it now shows me using 0.09GB
> > of my stored data quota. (In comparison, the amount of "payload" data
> > in the store is only 141,725 x 32 = 4,535,200 bytes, or ~0.004GB.)
>
> > I'm not sure whether it's best to compute overhead-per-property or
> > overhead-per-record (there is obviously a nontrivial cost to the
> > former, of course), and determining which is actually using the extra
> > quota would probably take more effort than it's worth, so I figured
> > I'd ask here.
>
> > Anyway, in terms of overhead, I'm observing (in this case)
> > approximately 600 bytes per record (in addition to the size of the
> > record's payload), or 150 bytes per property. In my particular case
> > (though I doubt this ratio is anything resembling constant), I'm
> > seeing 19 bytes of quota used for every byte of payload.
>
> > So I guess I have two big questions.
>
> > First, quota obviously measures something resembling the total size of
> > the datastore tables created by the user (their size on disk?), which,
> > due to indexing and such, is obviously going to be larger than the
> > size of the payload alone. Right?
>
> More or less, yes. The total size includes the size of the serialized Entity
> Protocol Buffer (which includes all the fields' values, their names ('int1'
> in your case), plus some encoding overhead), as well as several built-in
> indexes, plus any custom indexes you've built.
>
>
>
> > Second, how can I minimize this? Is the primary cost the added
> > indices, such that if I disable indexing for properties I won't be
> > querying on, I'll save space? (Again, this is something I could
> > experiment with, but at this point, why not just ask?) What's the
> > overhead per record, regardless of indexing? Are there any other steps
> > I can take to minimize my datastore quota usage?
>
> Disabling indexing for fields you don't require to be indexed is the
> simplest and most effective step. You can also use shorter field names, to
> reduce redundant storage requirements there, if you wish. To make things
> more user friendly, you can specify a name in your model:
>
> class MyModel(db.Model):
> areallylongpropertyname = db.IntegerProperty(name='a')
>
> But bear in mind that you'll have to use the shortened names when executing
> queries.
>
> -Nick Johnson
>
>
>
> > (The application I'm actually developing stores a few hundreds of
> > megabytes of data in three tables--two of them only necessary because
> > of the limitations on the types of queries datastore can handle--and
> > while when I was developing the application I estimated that we would
> > be able to stay well within an affordable amount of quota usage, it
> > now appears that we would be using something like 200GB of
> > datastore.)
>
> > Thanks,
> > Calvin
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---