Having hit a bunch of issues using scroll, I thought I better improve my 
understanding of how scroll is supposed to be used (and how its not 
supposed to be used).


   1. Does it make sense to execute a search request with scroll, but 
   SearchType != SCAN?
   2. Does it make sense to execute a search request with scroll, and also 
   with facet/aggregations?
   3. What is the difference between scrolling to the end of the results 
   (ie calling until hits.length ==0) and issuing a specific 
   ClearScrollRequest? It appears to me that the ClearScrollRequest 
   immediately clears the scroll - whereas there is some time delay before a 
   scroll is cleaned up after reaching the end of the results. ( I can see 
   this in my tests because the ElasticsearchIntegrationTest fails on teardown 
   unless I perform an explicit ClearScrollRequest or I put a delay of some 
   number of seconds). From reading the docs, I am not sure if this a bug or 
   expected behaviour.
   4. Does the scrollId represent the cursor, or the cursor page/iteration 
   state? I have read documentation/mailing list explanations that have words 
   to the effect "you must pass the scrollId from the previous response into 
   the subsequent request" - which suggests the id represents some cursor 
   state - ie performing a scroll request with a given scrollId will always 
   return the same results. My observation, however, is that the scrollId does 
   not change (ie I get back the same scrollId I passed in) so each scroll 
   request with the same scrollId advances the 'cursor' until no results are 
   returned. I have also read stuff on the mailing list that implied multiple 
   calls could be made in parallel with the same scrollId to load all the 
   results faster (which would imply the scrollId is *not* expected to 
   change). So which is correct? :)


To explain the background for my questions: I have two requirements :
1) I get an update event that leads me to go find items in the index that 
need re-indexing. I perform a search on the index, I get the id's and I 
load the original data from the source system(s) to reconstruct the 
document and index it. This seems to be exactly what SCAN and SCROLL is 
meant for. (However, the SCAN search type is different in that it always 
returns zero hits from the original search request - only the scroll 
requests seem to 

2) The user normally performs a search, and naturally we limit how many 
results we serve to the client. However, occasionally, the user wants to 
return all the data for a given search/filter (say, to export to excel or 
whatever), so it seems like a good idea to use the scroll rather than 
paging through the results using from&size as we know we will get a 
consistent results even if documents are being added/removed/updated on the 
server.
>From a functionality perspective, I want to make sure the scrolling search 
request is the same as the non-scrolling search request so the user gets 
the same results - so from a code perspective, ideally I really want to 
make the codepath the same (save for adding the scroll keepAlive param). 
However, perhaps there are things I perform with my normal search (e.g. 
aggregations, SearchType.DEFAULT, etc) that just don't make sense when 
scrolling?

Many thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/80f173a7-07a0-4f72-a896-944223a3ac30%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to