Mike created LUCENE-7488:
----------------------------

             Summary: Consider tracking modification time of external file 
fields for faster reloading
                 Key: LUCENE-7488
                 URL: https://issues.apache.org/jira/browse/LUCENE-7488
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
    Affects Versions: 4.10.4
         Environment: Linux
            Reporter: Mike


I have an index of about 4M legal documents that has pagerank boosting 
configured as an external file field. The external file is about 100MB in size 
and has one row per document in the index. Each row indicates the pagerank 
score of a document. When we open new searchers, this file has to get reloaded, 
and it creates a noticeable delay for our users -- takes several seconds to 
reload. 

An idea to fix this came up in [a recent 
discussion|https://www.mail-archive.com/solr-user@lucene.apache.org/msg125521.html]:
 Could the file only be reloaded if it has changed on disk? In other words, 
when new searchers are opened, could they check the modtime of the file, and 
avoid reloading it if the file hasn't changed? 

In our configuration, this would be a big improvement. We only change the 
pagerank file once/week because computing it is intensive and new documents 
don't tend to have a big impact. At the same time, because we're regularly 
adding new documents, we do hundreds of commits per day, all of which have a 
delay as the (largish) external file field is reloaded. 

Is this a reasonable improvement to request? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to