One more question I forgot: Rather than looking at hits.length to know if the end of the scroll has been reached, would it not be better to return a null scrollId when the end of the cursor has been reached? On the surface it seems that would be a) more intuitive b) be the same regardless of which SearchType you are using c) not be affected by the search itself returning zero results
Cheers. On Tuesday, 17 June 2014 18:46:07 UTC+1, mooky wrote: > > Having hit a bunch of issues using scroll, I thought I better improve my > understanding of how scroll is supposed to be used (and how its not > supposed to be used). > > > 1. Does it make sense to execute a search request with scroll, but > SearchType != SCAN? > 2. Does it make sense to execute a search request with scroll, and > also with facet/aggregations? > 3. What is the difference between scrolling to the end of the results > (ie calling until hits.length ==0) and issuing a specific > ClearScrollRequest? It appears to me that the ClearScrollRequest > immediately clears the scroll - whereas there is some time delay before a > scroll is cleaned up after reaching the end of the results. ( I can see > this in my tests because the ElasticsearchIntegrationTest fails on > teardown > unless I perform an explicit ClearScrollRequest or I put a delay of some > number of seconds). From reading the docs, I am not sure if this a bug or > expected behaviour. > 4. Does the scrollId represent the cursor, or the cursor > page/iteration state? I have read documentation/mailing list explanations > that have words to the effect "you must pass the scrollId from the > previous > response into the subsequent request" - which suggests the id represents > some cursor state - ie performing a scroll request with a given scrollId > will always return the same results. My observation, however, is that the > scrollId does not change (ie I get back the same scrollId I passed in) so > each scroll request with the same scrollId advances the 'cursor' until no > results are returned. I have also read stuff on the mailing list that > implied multiple calls could be made in parallel with the same scrollId to > load all the results faster (which would imply the scrollId is *not* > expected > to change). So which is correct? :) > > > To explain the background for my questions: I have two requirements : > 1) I get an update event that leads me to go find items in the index that > need re-indexing. I perform a search on the index, I get the id's and I > load the original data from the source system(s) to reconstruct the > document and index it. This seems to be exactly what SCAN and SCROLL is > meant for. (However, the SCAN search type is different in that it always > returns zero hits from the original search request - only the scroll > requests seem to > > 2) The user normally performs a search, and naturally we limit how many > results we serve to the client. However, occasionally, the user wants to > return all the data for a given search/filter (say, to export to excel or > whatever), so it seems like a good idea to use the scroll rather than > paging through the results using from&size as we know we will get a > consistent results even if documents are being added/removed/updated on the > server. > From a functionality perspective, I want to make sure the scrolling search > request is the same as the non-scrolling search request so the user gets > the same results - so from a code perspective, ideally I really want to > make the codepath the same (save for adding the scroll keepAlive param). > However, perhaps there are things I perform with my normal search (e.g. > aggregations, SearchType.DEFAULT, etc) that just don't make sense when > scrolling? > > Many thanks. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b6697426-9d3b-43e4-8c9e-cd14bf3c7859%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
