Re: Scroll Questions

mooky Wed, 18 Jun 2014 02:28:33 -0700

Many thanks Jörg.

Further questions/comments inline:


> 1. yes


Thanks,

2. facet/aggregations are not very useful while scrolling (I doubt they 
> even work at all) because scrolling works on shard level and aggregations 
> work on indices level


If they are not expected to work, would it make sense to either:

   1. prevent aggregation/facet requests in conjunction with scroll 
   requests (ie give an error to the user)
   2. Simply not execute them? 

If it doesn't make sense, would it be better to not return any 
aggregation/facet results at all?

3. a scroll request takes resources. The purpose of ClearScrollRequest is 
> to release those resources explicitly. This is indeed a rare situation when 
> you need explicit clearing. The time delay of releasing scrolls implicitly 
> can be controlled by the requests.


Do you mean the keepAlive time? So, does the scroll (and its resources) 
always remain for the duration of the keepAlive (since the last request on 
that scroll) regardless of whether the end of the scroll was reached or not?

I read the following (from the documentation) to imply that reading to the 
end of the scroll had the effect of "aborting" and therefore cleaning up 
resources.

Besides consuming the scroll search until no hits has been returned a 
scroll search can also be aborted by deleting the scroll_id

So, just to confirm, reading to the end of the results does nothing in 
terms of bringing about the cleanup of the scroll? Its either the TTL or 
the ClearScrollRequest that brings about the cleanup of resources.

Is there any downside to calling ClearScrollRequest explicitly?
(I am inclined to call it explicitly when the end of the scroll is reached 
in order clean up resources asap)


4. yes, the scroll id is an encoding of the combined state of all the 
> shards that participate in the scroll. Even if the ID looks as if it has 
> not changed, you should always use the latest reference to the scroll ID in 
> the response, or you may clutter the nodes with unreleased scroll resources.


Thanks for the explanation.

A null scroll ID is a matter of API design. By using hit length check for 
> 0, you can use the same condition for other queries, so it is convenient 
> and not confusing. Null scroll IDs are always prone to NPEs.


Agreed. Its a matter of API style/design.
The only issue I have with checking hits.length is that depending on the 
SearchType, sometimes hits.length==0 does not mean the end of the results 
(e.g. SearchType.SCAN). Its the lack of consistency that bothers me about 
it. It requires the code that handles results to be aware of a detail of 
the request.

My case for using scrollId is that:
The scrollId is already null if no scroll is requested.
For this reason, (IMO) scrollId==null would be a more consistent indicator 
of no scrolling required - or no further scrolling required. Also it would 
re-enforce the notion that the user should always use/observe the returned 
scrollId - they would have to.

Cheers,
-Nick


On Wednesday, 18 June 2014 00:04:06 UTC+1, Jörg Prante wrote:
>
> 1. yes
>
> 2. facet/aggregations are not very useful while scrolling (I doubt they 
> even work at all) because scrolling works on shard level and aggregations 
> work on indices level
>
> 3. a scroll request takes resources. The purpose of ClearScrollRequest is 
> to release those resources explicitly. This is indeed a rare situation when 
> you need explicit clearing. The time delay of releasing scrolls implicitly 
> can be controlled by the requests.
>
> 4. yes, the scroll id is an encoding of the combined state of all the 
> shards that participate in the scroll. Even if the ID looks as if it has 
> not changed, you should always use the latest reference to the scroll ID in 
> the response, or you may clutter the nodes with unreleased scroll resources.
>
> Scrolling is very different from search, because there is a shard-level 
> machinery that iterates over the Lucene segments and keep them open. This 
> tends to ramp up lots of server-side resources, which may long-lived - a 
> challenge for resource management. There is a reaper thread that wakes up 
> from time to time to take care of stray scroll searches. You observed this 
> as a "time delay". Ordinary search actions never keep resources open at 
> shard level.
>
> Using scroll search for creating large CSV exports is adequate because 
> this iterates through the result set doc by doc. But replacing a 
> full-fledged search that has facets/filters/aggregations/sorting with a 
> scroll search, you will only create large overheads (if it is even 
> possible). 
>
> A null scroll ID is a matter of API design. By using hit length check for 
> 0, you can use the same condition for other queries, so it is convenient 
> and not confusing. Null scroll IDs are always prone to NPEs.
>
> Jörg
>
>
>
> On Tue, Jun 17, 2014 at 7:46 PM, mooky <[email protected] <javascript:>
> > wrote:
>
>> Having hit a bunch of issues using scroll, I thought I better improve my 
>> understanding of how scroll is supposed to be used (and how its not 
>> supposed to be used).
>>
>>
>>    1. Does it make sense to execute a search request with scroll, but 
>>    SearchType != SCAN?
>>    2. Does it make sense to execute a search request with scroll, and 
>>    also with facet/aggregations?
>>    3. What is the difference between scrolling to the end of the results 
>>    (ie calling until hits.length ==0) and issuing a specific 
>>    ClearScrollRequest? It appears to me that the ClearScrollRequest 
>>    immediately clears the scroll - whereas there is some time delay before a 
>>    scroll is cleaned up after reaching the end of the results. ( I can see 
>>    this in my tests because the ElasticsearchIntegrationTest fails on 
>> teardown 
>>    unless I perform an explicit ClearScrollRequest or I put a delay of some 
>>    number of seconds). From reading the docs, I am not sure if this a bug or 
>>    expected behaviour. 
>>    4. Does the scrollId represent the cursor, or the cursor 
>>    page/iteration state? I have read documentation/mailing list explanations 
>>    that have words to the effect "you must pass the scrollId from the 
>> previous 
>>    response into the subsequent request" - which suggests the id represents 
>>    some cursor state - ie performing a scroll request with a given scrollId 
>>    will always return the same results. My observation, however, is that the 
>>    scrollId does not change (ie I get back the same scrollId I passed in) so 
>>    each scroll request with the same scrollId advances the 'cursor' until no 
>>    results are returned. I have also read stuff on the mailing list that 
>>    implied multiple calls could be made in parallel with the same scrollId 
>> to 
>>    load all the results faster (which would imply the scrollId is *not* 
>> expected 
>>    to change). So which is correct? :) 
>>
>>
>> To explain the background for my questions: I have two requirements :
>> 1) I get an update event that leads me to go find items in the index that 
>> need re-indexing. I perform a search on the index, I get the id's and I 
>> load the original data from the source system(s) to reconstruct the 
>> document and index it. This seems to be exactly what SCAN and SCROLL is 
>> meant for. (However, the SCAN search type is different in that it always 
>> returns zero hits from the original search request - only the scroll 
>> requests seem to 
>>
>> 2) The user normally performs a search, and naturally we limit how many 
>> results we serve to the client. However, occasionally, the user wants to 
>> return all the data for a given search/filter (say, to export to excel or 
>> whatever), so it seems like a good idea to use the scroll rather than 
>> paging through the results using from&size as we know we will get a 
>> consistent results even if documents are being added/removed/updated on the 
>> server.
>> From a functionality perspective, I want to make sure the scrolling 
>> search request is the same as the non-scrolling search request so the user 
>> gets the same results - so from a code perspective, ideally I really want 
>> to make the codepath the same (save for adding the scroll keepAlive param). 
>> However, perhaps there are things I perform with my normal search (e.g. 
>> aggregations, SearchType.DEFAULT, etc) that just don't make sense when 
>> scrolling?
>>
>> Many thanks.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/80f173a7-07a0-4f72-a896-944223a3ac30%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/80f173a7-07a0-4f72-a896-944223a3ac30%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ad0f4c3f-fd11-4af6-b50a-bbf8f7e8695a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scroll Questions

Reply via email to