[ 
https://issues.apache.org/jira/browse/SOLR-9651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583091#comment-15583091
 ] 

Keith Laban commented on SOLR-9651:
-----------------------------------

I wrote an extension of EFF called RemoteFileField (SOLR-9617). The idea is 
that you can drop your external file field in an s3 bucket or some remote 
hosted place and then tell solr to suck it down and update the EFF. We could 
probably use the same approach to have it just do atomic updates to the 
documents instead of writing an external file. 

Maybe the title/description of this ticket should be updated to be a discussion 
around finding a better approach for EFF.

> Consider tracking modification time of external file fields for faster 
> reloading
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-9651
>                 URL: https://issues.apache.org/jira/browse/SOLR-9651
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 4.10.4
>         Environment: Linux
>            Reporter: Mike
>
> I have an index of about 4M legal documents that has pagerank boosting 
> configured as an external file field. The external file is about 100MB in 
> size and has one row per document in the index. Each row indicates the 
> pagerank score of a document. When we open new searchers, this file has to 
> get reloaded, and it creates a noticeable delay for our users -- takes 
> several seconds to reload. 
> An idea to fix this came up in [a recent discussion in the Solr mailing 
> list|https://www.mail-archive.com/solr-user@lucene.apache.org/msg125521.html]:
>  Could the file only be reloaded if it has changed on disk? In other words, 
> when new searchers are opened, could they check the modtime of the file, and 
> avoid reloading it if the file hasn't changed? 
> In our configuration, this would be a big improvement. We only change the 
> pagerank file once/week because computing it is intensive and new documents 
> don't tend to have a big impact. At the same time, because we're regularly 
> adding new documents, we do hundreds of commits per day, all of which have a 
> delay as the (largish) external file field is reloaded. 
> Is this a reasonable improvement to request? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to