[google-appengine] Re: A question for Jaiku's developers, if they're watching..

peterk Fri, 13 Mar 2009 02:37:59 -0700

I was just toying around with this idea yesterday Brett.. :D I did
some profiling, and it would reduce the write cost per subscriber to
about 24ms-40ms (depending on the number of subscribers you have..more
= lower cost per avg), from 100-150ms. These are rough numbers with
entities I was using, I have to do some more accurate profiling..


When I first thought about doing this, I was thinking ":o I'll reduce
write cost by a factor of hundreds!", but as it turns out, the extra
index update time for an entity with a large number of list property
entries eats into that saving significantly.

But it still is a saving. Funnily enough the per subscriber saving
increases (to a point) the more subscribers you have.

I'm not sure if there's anything one can do to optimise index creation
time with large lists.. I'm going to do some more work as well to see
if there's an optimum 'batch size' for grouping subscribers
together..at first blush, as mentioned above, it seems the larger the
better (up to the per entity property/index cap of course).

Thanks also for the insight on pubsubhubub..I eagerly await updates on
that front :) Thank you!!

On Mar 13, 8:05 am, Paul Kinlan <[email protected]> wrote:
> Just Curious,
>
> For other pub/sub-style systems where you want to write to the
> Datastore, the trick is to use list properties to track the
> subscribers you've published to. So for instance, instead of writing a
> single entity per subscriber, you write one entity with 1000-2000
> subscriber IDs in a list. Then all queries for that list with an
> equals filter for the subscriber will show the entity. This lets you
> pack a lot of information into a single entity write, thus minimizing
> Datastore overhead, cost, etc. Does that make sense?
>
> So if you have over the 5000 limit in the subscribers would you write the
> entity twice? Each with differnt subscriber id's?
>
> Paul
>
> 2009/3/13 Brett Slatkin <[email protected]>
>
>
>
> > Heyo,
>
> > Good finds, peterk!
>
> > pubsubhubbub uses some of the same techniques that Jaiku uses for
> > doing one-to-many fan-out of status message updates. The migration is
> > underway as we speak
> > (http://www.jaiku.com/blog/2009/03/11/upcoming-service-break/). I
> > believe the code should be available very soon.
>
> > 2009/3/11 peterk <[email protected]>:
>
> > > The app is actually live here:
>
> > >http://pubsubhubbub.appspot.com/
> > >http://pubsubhubbub-subscriber.appspot.com/
>
> > > (pubsubhubbub-publisher isn't there, but it's trivial to upload your
> > > own.)
>
> > > This suggests it's working on appengine as it is now. Been looking
> > > through the source, and I'm not entirely clear on how the 'background
> > > workers' are actually working..there are two, one for pulling updates
> > > to feeds from publishers, and one for propogating updates to
> > > subscribers in batches.
>
> > > But like I say, I can't see how they're actually started and running
> > > constantly.  There is a video here of a live demonstration:
>
> > >http://www.veodia.com/player.php?vid=fCNU1qQ1oSs
>
> > > The background workers seem to be behaving as desired there, but I'm
> > > not sure if they were just constantly polling some urls to keep the
> > > workers live for the purposes of that demo, or if they're actually
> > > running somehow constantly on their own.. I can't actually get the
> > > live app at the urls above to work, but not sure if it's because
> > > background workers aren't really working, or because i'm feeding it
> > > incorrect urls/configuration etc.
>
> > Ah sorry yeah I still have the old version of the source running on
> > pubsubhubbub.appspot.com; I need to update that with a more recent
> > build. Sorry for the trouble! It's still not quite ready for
> > widespread use, but it should be soon.
>
> > The way pubsubhubbub does fan-out, there's no need to write an entity
> > for each subscriber of a feed. Instead, each time it consumes a task
> > from the work queue it will update the current iterator position in
> > the query result of subscribers for a URL. Subsequent work requests
> > will offset into the subscribers starting at the iterator position.
> > This works well in this case because it's using urlfetch to actually
> > notify subscribers, instead of writing to the Datastore.
>
> > For other pub/sub-style systems where you want to write to the
> > Datastore, the trick is to use list properties to track the
> > subscribers you've published to. So for instance, instead of writing a
> > single entity per subscriber, you write one entity with 1000-2000
> > subscriber IDs in a list. Then all queries for that list with an
> > equals filter for the subscriber will show the entity. This lets you
> > pack a lot of information into a single entity write, thus minimizing
> > Datastore overhead, cost, etc. Does that make sense?
>
> > @bFlood: Indeed, the async_apiproxy.py code is interesting. Not much
> > to say about that at this time, besides the fact that it works. =)
>
> > -Brett
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: A question for Jaiku's developers, if they're watching..

Reply via email to