See SkipExistingDocumentsProcessorFactory.   I'm using it; works great!

On Wed, May 17, 2017 at 12:35 AM Scott Blum <[email protected]> wrote:

> Hi folks,
>
> Recently ran into a data merge use case where I want to backfill a ton of
> documents off of storage into solr, but only if they don't already exist in
> Solr.  (If they exist, they're newer.)
>
> I couldn't find an efficient way to do this in bulk; if any document in my
> batch ran into a conflict, the whole batch would fail.  And
> single-doc-per-request is super slow.
>
> So I changed DistributedUpdateProcessor to look for a request parameter,
> and if present, any conflict documents are silently dropped, but the
> request as a whole goes through.
>
> Any interest in upstreaming this?
>
> Scott
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Reply via email to