[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

Hoss Man (JIRA) Fri, 22 Nov 2013 08:33:25 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated SOLR-5463:
---------------------------

    Attachment: SOLR-5463__straw_man.patch

bq. I disagree. The fieldDoc only contains the values that were sorted on. This 
is what is minimal and necessary to do paging

FieldDoc subclasses ScoreDoc which includes the internal docid -- and 
PagingFieldCollector does look at it.  But as you say: as long as we include 
uniqueKey in the fields (which i already mentioned) then the docid in the 
FieldDoc shouldn't matter since (i think?) it's only used as a tie breaker.

bq. If solr wants to avoid lucene docids for some reason (e.g. because it does 
not yet implement searcher leases) ...

I'm glad you brought up searcher leases, because i wanted to mention it before 
but i forgot...

* I have no idea how to even try to implement searcher leases in a sane way in 
a distribted solr setup, given that we want clients to be able to hit any 
replica on subsequent requests.
* For my use cases, I actively do *NOT* want a searcher lease when doing deep 
paging: if documents matching my searcher, but on high pages i have not loaded 
yet, get deleted from the index, i don't want them included in the results once 
i get to that page just because they were a match X minutes ago when my search 
started.

I think what makes the most sense is to ensure we can support deep paging w/o 
searcher leases, and then if/when searcher leases are supported people who want 
both can have both.

----

I'm attaching my current progress with a straw man impl + tests.  It includes 
the basic functionality & tests for doing deep paging on a single node solr 
setup using numeric sorts.

There are an absurd number of nocommits in this patch: most of them are in the 
impl and i'm not worried about them because im hoping the impl can ultimately 
be thrown out; some are in the test because of additional tests i want to 
write; some are in the test because of silly limitations in the impl.

Only one class of nocommits really concerns me at this point and that's the 
issue of dealing with String sorts -- the way Solr's distributed sorting code 
deals with fields that use SortField.Type.STRING (and presumably 
SolrTield.Type.STRING_VAL) results in the coordinator node having a String 
object even though the underlying FieldComparator expects/uses BytesRef as the 
comparison value.  

I could probably hack arround this, and convert the Strings back to BytesRef 
myself in the DeepPaging code -- but this actually smells like a more 
fundamental problem we should address.  It seems to be the same root problem 
that sarowe has been looking into in SOLR-5354 in order to play nicer with 
custom FieldTypes: safely "serializing" the true sort object (regardless of 
what it is) between shards->coordinator, and then deserializing it & using the 
*real* FieldComparator for each field to do the aggregated sorting of the docs 
from each shard.

----

In any case, my next step is to get a some distributed tests setup and working 
against this straw man impl, and then dig into throwing away the straw man impl 
and trying to replace it with PagingFieldCollector -- posibly with a side 
diversion to help sarowe fix the underlying problems in SOLR-5354 first.


> Provide cursor/token based "searchAfter" support that works with arbitrary 
> sorting (ie: "deep paging")
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5463
>                 URL: https://issues.apache.org/jira/browse/SOLR-5463
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-5463__straw_man.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, 
> leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
> at the lucene level: require the clients to provide back a token indicating 
> the sort values of the last document seen on the previous "page".  This is 
> similar to the "cursor" model I've seen in several other REST APIs that 
> support "pagnation" over a large sets of results (notable the twitter API and 
> it's "since_id" param) except that we'll want something that works with 
> arbitrary multi-level sort critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while 
> ago, but the key bit of argument parsing to leverage it was commented out due 
> to some problems (see comments in that issue).  It's also somewhat out of 
> date at this point: at the time it was commited, IndexSearcher only supported 
> searchAfter for simple scores, not arbitrary field sorts; and the params 
> added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on 
> ensuring that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")

Reply via email to