Hello Daniel, First, let me point out the final tip from the article you linked: "don’t prematurely optimize for this case, since chances are, you won’t run into it." Since the Datastore lets you change your schema at any time and that the optimizations you will need to make will depend on future usage you might not be able to predict, it might be best to not spend too much time dwelling on optimizing these queries yet.
This being said, the final solution going to be highly dependent on the kind of queries you want to be able to run against your Datastore. As pointed out in the best practices article <https://cloud.google.com/datastore/docs/best-practices#high_readwrite_rates_to_a_narrow_key_range>, a timestamp prefix can be related to a specific query you need to make (the given example being a userid), or it can also be random as you pointed out, but this doesn't force you to query the resulting "buckets" separately. The prefix just needs to vary enough to properly shard the index across many Bigtable tablets and allow for faster reads and writes of that index as a whole. Whether or not it is worthwhile to perform the sorting in-memory rather than have the Datastore index do it is something you will need to decide based on your experience with the performance of each of your queries. Regarding expensive queries being made often, such as for the content that appears on your main page, you can and certainly should be storing the result of those queries using Memcache <https://cloud.google.com/appengine/docs/standard/python/memcache/> so that popular listings do not need to be constantly re-computed. I hope this helped. Please let me know if some aspect(s) of my answer need to be further detailed. On Sunday, January 21, 2018 at 1:32:35 PM UTC-5, Daniel Jozsef wrote: > > Hello dear people, > > There's this thing that has been bothering me for a while. I need to work > on an application that we expect to scale, and I have trouble reconciling > loudly stated best practices and baseline requirements. > > Almost all "web2" media relies on a chronological order. When I browse > facebook, or google+, or youtube, it's not posts and videos from 10 years > ago that I want to (or do) see. Even though Facebook and Google never seem > to present a "fully chronological ordered" list, the worst that can happen > is that I see a post from two hours ago after one from two days ago. Never > after one from 2008. > > However, it would seem that distributed NoSQL databases *hate* timestamps. > And not only timestamps, but chronological order *in general*. (see > https://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/ > ) > > So... I've looked into the alternatives... And the *only* solution I found > was that I could prefix timestamps with random "bucket ids", their number > potentially scaled based on the "write heat" of each entity, and run a > separate query for each bucket... but that makes managing pagination beyond > ridiculous, and I worry that it would make queries - like someone just > randomly navigating to the front page, or hitting reload - expensive, and > my gut feeling is that making the most frequent query type more expensive > is a bad idea. > > The problem is some way in the future now, but I'm really interested how > the big players do it. I mean, just thinking of all the writes facebook > must handle... > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/5eba17f9-5566-4658-8da9-003970fdbb5c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
