[jira] [Commented] (SOLR-2656) realtime get

Michael McCandless (JIRA) Mon, 18 Jul 2011 15:45:25 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067363#comment-13067363
 ]


Michael McCandless commented on SOLR-2656:
------------------------------------------

{quote}
bq. Maybe we should call this near-real-time get?

That sort of defeats the purpose of the issue - it's supposed to be a 100% 
reliable get of the latest version of a document.
{quote}

Right, it will always return the last added doc under that ID; I'm not
disputing that part.

I am disputing that it's really "real-time" given that it's built on
top of "near-real-time".  Ie calling this real-time is over-selling
it, I think; the performance will not be great?

Another thing to consider is NRTCachingDir; it's good for reducing
latency when you are frequently flushing tiny segments (make the
reopen IO-less, except for the ID lookups, unless you use MemCodec, at
which point the NRT open is fully IO free).

{quote}
bq. The approach here is to always reopen the reader on-demand when a RT get 
arrives, ie, if any changes had been made to the index with IndexWriter?

I was thinking ahead to a more generic version where one could specify the 
clock (I think this will be needed for future distrib indexing support). I 
actually first added a version that took an explicit clock but then simplified 
it to always use the latest clock and marked it as experimental.
{quote}

What kind of "clocks" would one want to plug in here?  Do you mean you
could choose to accept some staleness if you wanted (plug in a clock
that only increments periodically if there had been updates)?

{quote}
bq. But, stepping back, this approach (open new NRT reader on demand) seems 
dangerous? Ie perf will be poor if a client has one thread constantly updating 
and another constantly doing RT get?

It's better than what we have today, and it can be optimized in the future.
{quote}

I agree, progress not perfection.

bq. One way would be with a bloom filter of updates that are not yet visible. 
Another way will again relate to recovery in distributed indexing, when we'll 
need to ask another node what all the latest updates after clock x were (and 
since we'll have those on hand, we can check any realtime-get against that 
first).

Maybe Solr should use a transaction log (like ElasticSearch)?  I think
(not certain) that ES serves a RT get directly out of its transaction
log if the doc is in it (else falls back to the reader)?  Then
simultaneous updates + gets should really be real-time.  But I
realize that'd be a much bigger change...


> realtime get
> ------------
>
>                 Key: SOLR-2656
>                 URL: https://issues.apache.org/jira/browse/SOLR-2656
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Yonik Seeley
>            Assignee: Yonik Seeley
>         Attachments: SOLR-2656.patch
>
>
> Provide a non point-in-time interface to get a document.
> For example, if you add a new document, you will be able to get it, 
> regardless of if the searcher has been refreshed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2656) realtime get

Reply via email to