Furthermore on using hits.length ==0, Shard failure(s) can mean hits.length==0 but perhaps the end of the scroll.
On Tuesday, 17 June 2014 18:46:07 UTC+1, mooky wrote: > > Having hit a bunch of issues using scroll, I thought I better improve my > understanding of how scroll is supposed to be used (and how its not > supposed to be used). > > > 1. Does it make sense to execute a search request with scroll, but > SearchType != SCAN? > 2. Does it make sense to execute a search request with scroll, and > also with facet/aggregations? > 3. What is the difference between scrolling to the end of the results > (ie calling until hits.length ==0) and issuing a specific > ClearScrollRequest? It appears to me that the ClearScrollRequest > immediately clears the scroll - whereas there is some time delay before a > scroll is cleaned up after reaching the end of the results. ( I can see > this in my tests because the ElasticsearchIntegrationTest fails on > teardown > unless I perform an explicit ClearScrollRequest or I put a delay of some > number of seconds). From reading the docs, I am not sure if this a bug or > expected behaviour. > 4. Does the scrollId represent the cursor, or the cursor > page/iteration state? I have read documentation/mailing list explanations > that have words to the effect "you must pass the scrollId from the > previous > response into the subsequent request" - which suggests the id represents > some cursor state - ie performing a scroll request with a given scrollId > will always return the same results. My observation, however, is that the > scrollId does not change (ie I get back the same scrollId I passed in) so > each scroll request with the same scrollId advances the 'cursor' until no > results are returned. I have also read stuff on the mailing list that > implied multiple calls could be made in parallel with the same scrollId to > load all the results faster (which would imply the scrollId is *not* > expected > to change). So which is correct? :) > > > To explain the background for my questions: I have two requirements : > 1) I get an update event that leads me to go find items in the index that > need re-indexing. I perform a search on the index, I get the id's and I > load the original data from the source system(s) to reconstruct the > document and index it. This seems to be exactly what SCAN and SCROLL is > meant for. (However, the SCAN search type is different in that it always > returns zero hits from the original search request - only the scroll > requests seem to > > 2) The user normally performs a search, and naturally we limit how many > results we serve to the client. However, occasionally, the user wants to > return all the data for a given search/filter (say, to export to excel or > whatever), so it seems like a good idea to use the scroll rather than > paging through the results using from&size as we know we will get a > consistent results even if documents are being added/removed/updated on the > server. > From a functionality perspective, I want to make sure the scrolling search > request is the same as the non-scrolling search request so the user gets > the same results - so from a code perspective, ideally I really want to > make the codepath the same (save for adding the scroll keepAlive param). > However, perhaps there are things I perform with my normal search (e.g. > aggregations, SearchType.DEFAULT, etc) that just don't make sense when > scrolling? > > Many thanks. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/788e5f30-2a7e-4777-9377-9357c283bf2b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
