[ 
https://issues.apache.org/jira/browse/SOLR-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865865#comment-15865865
 ] 

Mike Drob commented on SOLR-10006:
----------------------------------

bq. The take-away here is that the solr core must be restarted so there is 
never an open searcher on that core, perhaps your stress test isn't doing that?
Guilty.

bq. reloading the core from the admin UI silently fails with a .doc file 
removed. By that I mean the UI doesn't show any problems even though the log 
file has exceptions.
this might be best as a separate issue. i don't feel nearly comfortable enough 
with the ui to even begin to attempt to fix this.

bq. The core admin API correctly reports an error for action=RELOAD though 
(curl or the like)
Good.

bq. the admin UI still thinks the replica is active.
bq. a search on the replica with distrib=false also succeeds, even when I set a 
very large start parameter, but I suspect this is a function there still being 
an open file handle on the file I deleted so it's "kinda there" until restart.
I'm not sure this is wrong, based on your next points. If everything is in 
memory, and the core can serve requests, then from the system perspective it 
_is_ active. It's either the phantom file handle or everything is sitting in a 
cache.

bq. At this point (the searcher is working even thought the doc file is 
missing), a fetchindex doesn't think there's any work to do so "succeeds", i.e. 
it doesn't fetch from the masterUrl
maybe we need a {{force=true}} option here? I'm not sure there is another way 
to do a robust check that wouldn't be incredibly slow. maybe fetchindex is a 
rare enough command that it's ok to be slow?

> Cannot do a full sync (fetchindex) if the replica can't open a searcher
> -----------------------------------------------------------------------
>
>                 Key: SOLR-10006
>                 URL: https://issues.apache.org/jira/browse/SOLR-10006
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.3.1, 6.4
>            Reporter: Erick Erickson
>         Attachments: SOLR-10006.patch, SOLR-10006.patch, solr.log, solr.log
>
>
> Doing a full sync or fetchindex requires an open searcher and if you can't 
> open the searcher those operations fail.
> For discussion. I've seen a situation in the field where a replica's index 
> became corrupt. When the node was restarted, the replica tried to do a full 
> sync but fails because the core can't open a searcher. The replica went into 
> an endless sync/fail/sync cycle.
> I couldn't reproduce that exact scenario, but it's easy enough to get into a 
> similar situation. Create a 2x2 collection and index some docs. Then stop one 
> of the instances and go in and remove a couple of segments files and restart.
> The replica stays in the "down" state, fine so far.
> Manually issue a fetchindex. That fails because the replica can't open a 
> searcher. Sure, issuing a fetchindex is abusive.... but I think it's the same 
> underlying issue: why should we care about the state of a replica's current 
> index when we're going to completely replace it anyway?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to