@peterk - if you don't need to query by the subscriber, you could
alternatively pack the list of subscribers for a feed into a
TextProperty so it is not indexed. I use TextProperty a lot to store
large lists of geometry data and they work out pretty well

@brett - async! looking forward to it in future GAE builds. thanks

cheers
brian

On Mar 13, 5:37 am, peterk <[email protected]> wrote:
> I was just toying around with this idea yesterday Brett.. :D I did
> some profiling, and it would reduce the write cost per subscriber to
> about 24ms-40ms (depending on the number of subscribers you have..more
> = lower cost per avg), from 100-150ms. These are rough numbers with
> entities I was using, I have to do some more accurate profiling..
>
> When I first thought about doing this, I was thinking ":o I'll reduce
> write cost by a factor of hundreds!", but as it turns out, the extra
> index update time for an entity with a large number of list property
> entries eats into that saving significantly.
>
> But it still is a saving. Funnily enough the per subscriber saving
> increases (to a point) the more subscribers you have.
>
> I'm not sure if there's anything one can do to optimise index creation
> time with large lists.. I'm going to do some more work as well to see
> if there's an optimum 'batch size' for grouping subscribers
> together..at first blush, as mentioned above, it seems the larger the
> better (up to the per entity property/index cap of course).
>
> Thanks also for the insight on pubsubhubub..I eagerly await updates on
> that front :) Thank you!!
>
> On Mar 13, 8:05 am, Paul Kinlan <[email protected]> wrote:
>
> > Just Curious,
>
> > For other pub/sub-style systems where you want to write to the
> > Datastore, the trick is to use list properties to track the
> > subscribers you've published to. So for instance, instead of writing a
> > single entity per subscriber, you write one entity with 1000-2000
> > subscriber IDs in a list. Then all queries for that list with an
> > equals filter for the subscriber will show the entity. This lets you
> > pack a lot of information into a single entity write, thus minimizing
> > Datastore overhead, cost, etc. Does that make sense?
>
> > So if you have over the 5000 limit in the subscribers would you write the
> > entity twice? Each with differnt subscriber id's?
>
> > Paul
>
> > 2009/3/13 Brett Slatkin <[email protected]>
>
> > > Heyo,
>
> > > Good finds, peterk!
>
> > > pubsubhubbub uses some of the same techniques that Jaiku uses for
> > > doing one-to-many fan-out of status message updates. The migration is
> > > underway as we speak
> > > (http://www.jaiku.com/blog/2009/03/11/upcoming-service-break/). I
> > > believe the code should be available very soon.
>
> > > 2009/3/11 peterk <[email protected]>:
>
> > > > The app is actually live here:
>
> > > >http://pubsubhubbub.appspot.com/
> > > >http://pubsubhubbub-subscriber.appspot.com/
>
> > > > (pubsubhubbub-publisher isn't there, but it's trivial to upload your
> > > > own.)
>
> > > > This suggests it's working on appengine as it is now. Been looking
> > > > through the source, and I'm not entirely clear on how the 'background
> > > > workers' are actually working..there are two, one for pulling updates
> > > > to feeds from publishers, and one for propogating updates to
> > > > subscribers in batches.
>
> > > > But like I say, I can't see how they're actually started and running
> > > > constantly.  There is a video here of a live demonstration:
>
> > > >http://www.veodia.com/player.php?vid=fCNU1qQ1oSs
>
> > > > The background workers seem to be behaving as desired there, but I'm
> > > > not sure if they were just constantly polling some urls to keep the
> > > > workers live for the purposes of that demo, or if they're actually
> > > > running somehow constantly on their own.. I can't actually get the
> > > > live app at the urls above to work, but not sure if it's because
> > > > background workers aren't really working, or because i'm feeding it
> > > > incorrect urls/configuration etc.
>
> > > Ah sorry yeah I still have the old version of the source running on
> > > pubsubhubbub.appspot.com; I need to update that with a more recent
> > > build. Sorry for the trouble! It's still not quite ready for
> > > widespread use, but it should be soon.
>
> > > The way pubsubhubbub does fan-out, there's no need to write an entity
> > > for each subscriber of a feed. Instead, each time it consumes a task
> > > from the work queue it will update the current iterator position in
> > > the query result of subscribers for a URL. Subsequent work requests
> > > will offset into the subscribers starting at the iterator position.
> > > This works well in this case because it's using urlfetch to actually
> > > notify subscribers, instead of writing to the Datastore.
>
> > > For other pub/sub-style systems where you want to write to the
> > > Datastore, the trick is to use list properties to track the
> > > subscribers you've published to. So for instance, instead of writing a
> > > single entity per subscriber, you write one entity with 1000-2000
> > > subscriber IDs in a list. Then all queries for that list with an
> > > equals filter for the subscriber will show the entity. This lets you
> > > pack a lot of information into a single entity write, thus minimizing
> > > Datastore overhead, cost, etc. Does that make sense?
>
> > > @bFlood: Indeed, the async_apiproxy.py code is interesting. Not much
> > > to say about that at this time, besides the fact that it works. =)
>
> > > -Brett
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to