See SkipExistingDocumentsProcessorFactory. I'm using it; works great! On Wed, May 17, 2017 at 12:35 AM Scott Blum <[email protected]> wrote:
> Hi folks, > > Recently ran into a data merge use case where I want to backfill a ton of > documents off of storage into solr, but only if they don't already exist in > Solr. (If they exist, they're newer.) > > I couldn't find an efficient way to do this in bulk; if any document in my > batch ran into a conflict, the whole batch would fail. And > single-doc-per-request is super slow. > > So I changed DistributedUpdateProcessor to look for a request parameter, > and if present, any conflict documents are silently dropped, but the > request as a whole goes through. > > Any interest in upstreaming this? > > Scott > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
