Incremental Indexing when Source Data is not Incremental

William Nelis Fri, 19 May 2017 12:51:59 -0700

Hello.

I am new to Solr and have a question about incremental indexing. We have a 
source text file that contains millions of rows. Each row is saved as a 
document in Solr. There is one field in each row that is a unique identifier.


Unfortunately, this source text file can change. We need to check it every hour 
for changes. If rows are removed, we must remove them from Solr. If rows are 
added, we must add them to Solr.

We do not want to drop all records and re-load them. Instead we would like to 
diff for the changes. What is the recommended way of doing this? Can we just 
get all values Solr stores for the unique identifier field and do the diff 
external to Solr? Does Solr provide functionality that will allow us to do the 
incremental changes even though the source file itself is not incremental?


An example of the file format (obviously this is not a real file):

AAQX     This is the first document             213.32
AAZT      This is the second document        243.23
ABGT     This is the third document            321.43
...

The first column is the unique identifier (there are far more columns, but this 
has been simplified).


Thank you for your help.

Incremental Indexing when Source Data is not Incremental

Reply via email to