Hi Donovan,
  On pretty simple entities I often have puts take 100 API CPU ms, or
more.  Your numbers work out to about 3,500,000 entities / 1,000
documents, so it sounds like you're seeing somewhere around 50ms /
entity, which is pretty good.

  Your code looks pretty tight to me, but I've got two suggestions.
First, if you're not use Appstats, maybe you'll spot some other area
you can optimize a bit (though relative to a get / put of 3,500
entities I doubt it will save much).  Second, try to find a more
efficient way to batch operations and reduce the amount of writes
you're doing.  Perhaps you can process several documents at once, then
tokenize them and insert tasks to handle the comparisons.

  Some type of batching seems like it might be your best hope to me.
If you're lucky, maybe you can reduce your gets / writes by several
percent.


Robert






On Thu, Jan 6, 2011 at 14:36, Donovan <[email protected]> wrote:
> Hi,
>
> I'm using a very simple model to store arrays of document ids for an
> inverted index based on 3 million documents.
>
> class I(db.Model):
>    v=ArrayProperty(typecode="I",required=True)
>
> which uses:
>
> http://appengine-cookbook.appspot.com/recipe/store-arrays-of-numeric-values-efficiently-in-the-datastore/
>
> I have a simple task queue that includes the following piece of logic
> which loops 3,000 times a day, for new incoming documents which
> generate on average 3,500 keys each, to update the index:
>
> keys = gen_keys(document) // Builds a list of db.Key instances based
> on the document
> indexes=db.get(keys)
> upserts=[]
> for i,key in enumerate(indexes):
>    if indexes[i] is None:
>        upserts.append(I(key=keys[i],v=array('I',[document_id])))
>    elif news_article_id not in indexes[i].v:
>         indexes[i].v.append(document_id)
>         upserts.append(indexes[i])
> db.put(upserts)
>
> This loop leads to datastore CPU usage of 48 hours per 1000 documents
> which means a daily spend of $16.80 just for the datastore updates,
> which seems quite expensive given how something like Kyoto Cabinet
> running on conventional hosting could easily deal with this load. Does
> anyone have any ideas for minimizing the datastore CPU usage? My hunch
> is that the datastore CPU usage is a bit overpriced :(
>
> Cheers,
> Donovan.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to