The string-day entities are only needed for the current day, as a way to 
parallel concurrent writes. In fact, it sounds like you will want to delete 
the day-string entities when they are no longer needed. The digested 
DayCount entity also described is the way to process queries, and will mean 
fetching one entity per day in the requested range. Remember to put the 
results generated in the datastore for re-use.

The model I laid out can be made to work for the numbers you mention, and 
the more instantaneous consistency you're willing to sacrifice, the easier 
it gets.

   - You can push the string-writes to asynchronous tasks on the TaskQueue 
   and then have the task operate on the string-day counter. Generally these 
   are called within 100 ms of being added.
   - At the end of the day, have a cron job process each of the string-day 
   entities, generating and updating the DayCount entity and deleting for that 
   day and then deleting the string-day entity. You wont be able to process 
   all of these in one transaction, either do 4 string-day entities per 
   transaction (hitting the maximum of 5 entities for an XG transaction, 
   including the DayCount entity), or just stick to one per transaction.


Notice that this means today's data will not show up in your queries, since 
they're only updated at the end of the day. We can fix that too.

   - After incrementing a string-day's counter, set a property called 
   updated to [current date+time] before writing the entity and committing the 
   transaction. As an optimization, ignore it if updated was set within the 
   last few minutes.
   - Have a cron job run every X minutes (as often as you want the current 
   day's DayCount updated) to fetch today's DayCount and read it's 
   update-time, then query for string-day entities where updated > 
   dayCountUpdated.
   - Pull all of the counts from these string-day entities into your 
   DayCount entity and set its update-time to the highest observed update-time 
   in the list of string-day entities you got. Commit your changes to the 
   current DayCount.


Be smart about reusing old query results.

   - At the end of the week, generate weekly count aggregations
   - Same for end of month
   - Identify the largest aggregate result or group thereof which fit 
   inside the requested date range, and use these as the base of your 
   calculation

Remember to use unindexed properties for any part of the entities you're 
not going to use in a query, saves on cost.

Also, as someone who used JDO, I suggest you give Objectify a look.

On Monday, August 6, 2012 3:35:18 PM UTC+2, Neo wrote:
>
> Hi Joakim ,
>
> If I understand it correctly, the string-day entity would be something 
> like this:
> Suppose there are 10000 different strings S1, S2, ..., S10000 on some 
> particular day, each asked many a times in that day d1, then we will have 
> 10000 of entities: d1S1, d1S2, ...,d1S10000. That means, suppose there are 
> 't' days in the given time period, and on all of the days in this interval, 
> there are only 10000 different strings (S1 to S10000), and assuming that 
> each of the string is inputted on half of the days in the time interval - 
> that means we will have to process 10000 * (t/2) records - for t = 200 
> days, it turns out to be 1 Million of records processing for a single 
> query. This too, when we have assumed that there are only 10000 different 
> queries in those 200 days - in a real scenario this can well be 100000. 
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/ZrWBL5zNgYUJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to