ES Hadoop--Index only new documents without killing job from exceptions?

James Campbell Thu, 03 Jul 2014 10:50:33 -0700

Hi ES-Hadoop users--

I have a large list of simple documents that I would like to index for an 
auto complete feature. At batch processing time, I do not know which values 
are new (never seen before) and which are not (some other part of the 
update process changed, but the autocomplete-relevant portion of the 
document did not).


I believe I could simply write all of the documents to the index whenever I 
run a new batch with the default es.write.operation=index, but that will 
cause ES to reindex the document each time even if it wasn't updated.

On the other hand, if I choose to use es.write.operation=create, then any 
existing documents will cause the job to fail.

Is there a way to combine those behaviors, so that I can allow 
elasticsearch to simply ignore requests to reindex existing documents 
(based on _id) but not to throw an exception that kills the entire job?

James Campbell

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2e5b93ef-0c42-4068-bc2c-33e4efbe429b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES Hadoop--Index only new documents without killing job from exceptions?

Reply via email to