I'd solve this by having one unique entity per combination of date and
string, storing the string's total for that day in said entity. You can
achieve this uniqueness either by formatting special entity names (string
ids) to include both the date and the string. Aggregate the day's totals in
an entity keyed on the date, holding a map/dictionary of string->count.
Your query would become something like this:
SELECT * FROM DayCount WHERE __key__ >= Key('DayCount', startDate) AND
__key__ <= Key('DayCount', endDate)
Or just generate all possible keys from start to end and do a batch fetch.
Do the counting and sorting yourself. It's easy to see that this could
become slow and expensive, so make sure to cache the generated results -
especially if endDate is in the past - in an entity, and cache that entity
in memcache. Keep the key to that entity predictable (formatted string with
both dates) and you're down to a single fetch. This is also why I said to
aggregate all of the day's totals in the DayCount-entity, so that there
will be fewer fetches at query-time.
You may be asking yourself why you should bother having one entity per
string per day at all, when you could do this with just the
DayCount-entity. The reason is that there are limits to how many times per
second you can write to a single entity (entity group to be correct). For
more on this, see the App Engine talks from Google I/O 2011 (relevant
link<http://www.google.com/events/io/2011/sessions/more-9s-please-under-the-covers-of-the-high-replication-datastore.html>
).
Depending on how important consistency is to your application, you could
update the DayCount in the same transaction as the per-string-and-day
entity - which would be a bit slower, or defer that job to ASAP-but-not-now
by scheduling a task on the TaskQueue, or just have a cron job checking
every 5 or ten minutes.
On Wednesday, August 1, 2012 8:29:49 AM UTC+2, Neo wrote:
>
> Suppose, In my website, I ask users to input some string. A user can input
> string multiple times. Whenever any user inputs a string, I log it in the
> database along with the time. Many strings can be same, even though
> inputted by different users. In the home page, I need to give the interface
> such that any user can query for top n (say 50) strings in any time period
> (say 10 Jan 2012 to 30 Jan 2012). If it was SQL, I could have written query
> like:
>
> select string, count(*)
> from userStrings where day >= d1 and day <= d2
> group by string
> order by count(*) desc
> limit n
>
> How do I solve it in GAE environment? For each such user query, I can't
> process the record at query time - there can be millions of records.
>
> I am using JDO. My obvious goal is to minimize the app engine cost : CPU +
> data.
>
> Thanks,
>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/google-appengine/-/E13JhQDkaFYJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.