I would recommend indexing the document since it's a 'cheap' operation per document and it covers the potential differences between the docs. Also from a performance POV you are not going to lose much since you are anyway sending the doc to ES, which does hashing and returns the error to the user. So the only thing that you save and _might_ potentially see is the actual indexing which should become a problem only when dealing with large amounts of docs.
These being said, there's already an issue opened [1] for trapping/handling errors during a job (to prevent it from being cancelled) which potentially can be used for such a purpose as well. Free free to add your comments to it. [1] https://github.com/elasticsearch/elasticsearch-hadoop/issues/160 On Thu, Jul 3, 2014 at 8:49 PM, James Campbell <[email protected]> wrote: > Hi ES-Hadoop users-- > > I have a large list of simple documents that I would like to index for an > auto complete feature. At batch processing time, I do not know which values > are new (never seen before) and which are not (some other part of the > update process changed, but the autocomplete-relevant portion of the > document did not). > > I believe I could simply write all of the documents to the index whenever > I run a new batch with the default es.write.operation=index, but that will > cause ES to reindex the document each time even if it wasn't updated. > > On the other hand, if I choose to use es.write.operation=create, then any > existing documents will cause the job to fail. > > Is there a way to combine those behaviors, so that I can allow > elasticsearch to simply ignore requests to reindex existing documents > (based on _id) but not to throw an exception that kills the entire job? > > James Campbell > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/2e5b93ef-0c42-4068-bc2c-33e4efbe429b%40googlegroups.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmeX_Dc-LRNcgPxY4bQ6drz43eL%3DuQnRVYYD-kjZ8%3DJebw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
