I have come upon an interesting problem with pagination that I was wondering if anyone else solved elegantly. The problem can best be described by twitter's dev docs: https://dev.twitter.com/rest/public/timelines.
Essentially, using the from and size parameters (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html) makes it very hard to get the correct documents for results page two if a document(s) has been added since page one was loaded and the index is sorted from newest to oldest. Twitter suggests summing the offset or from param with the number of additional documents added since the previous request; however, with this solution we're reliant on the client having the correct count of documents added since the first page was loaded. For example the following index contains documents sorted from newest to oldest: E (newest) D C B A (oldest) If each page has a single document the first page will have document E and the offset or from parameter for the next page will be 1 with expectations of getting document D on the second page (since there is one document per page); however, since the first page has loaded document G was added to the index. Now the index looks like this: G (newest) E D C B A (oldest) Using the offset or from parameter of 1 in this case will return document E... Again. This is NOT the intended functionality and would lead to duplicate documents being returned. The only solution I've come up with doesn't seem ideal. For the first page I'll perform the same actions as the example above. Except in addition to returning document E the total number of documents in the index will be returned. In the case of index E through A that would be 5 total documents. Attempting to access additional pages thereafter will require providing the total number of documents obtained with the first request. Let's call that startSize. Otherwise we'll still have to pass the offset of 1. On the second and all requests thereafter we'll invert our the sorting of the documents to be oldest to newest. The inverted index will look like this: A (oldest) B C D E G (newest) The amount of documents per page will be referred to as pageSize (or size param in ES). The from parameter will be calculated using the following formula: from = startSize - offset - pageSize = 5 - 1 - 1 = 3 while size = pageSize = 1 Using the inverted index and the calculated parameters that will give us document D or the expected result for page two prior to document G being added to the index. On page 3 we'll get document C etc. That formula will give us the expected results when working with indices that are: sorted from newest to oldest, are constantly growing, and are accessed with pagination. I don't see this algorithm significantly increasing the cost of accessing the API but with that said I cannot help but think I've let the early hours of the morning get the best of me. Is there a better solution or something built into elasticsearch to handle this use case? Thanks in advance! -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/61c3dbce-6383-4270-91f4-acfa23ffa2f7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
