Hello, I'm not quite familiar with the ways ElasticSearch and Lucene works, but I know that there isn't really such a thing as updating a document. "Update" request are completed by a delete and an insert. So I assume it is "not as efficient", say, as an SQL update. So I've been wondering if it's a bad idea to update every single document that get indexed, exactly once. Now, how did I even came up with that idea and why would I do that?
Well, we use the ELK stack for processing, storing and presenting logs, nothing special here, the most typical use-case. But now, we'd like to do some post-processing/analyzing on the logs. We'd like to "highlight" the logs that are actually important. Meaning: Fetch not-yet analyzed documents, do a lot of regexp matching, and add an "alert" tag if necessary, and then push it back to ES. Now, I think I can get Logstash to do this (with some modifications to it). But I don't know how hard would it be on the "cluster" (only one beefy node so far, but a lot of logs), how would it affect the performance, is it error prone and how would it scale? Let's assume the use of the most efficient methods (scan+scroll API and bulk API for inserts [or are there better APIs for this?]). I'm well aware that technically I could do this before the first indexing of the documents, but in our case I think it's a lesser architecture design: Mixing log processing (~splitting) and analyzing is not a great practice. The current processing mechanism/logic is rarely modified, it works, it's solid. Now, the analyzer patterns would be changed and updated a lot, meaning a lot of Logstash restarts. If something get's messed up logs may not get processed well, or get lost, etc. Some of these issues could be eliminated by chaining two Logstash instances before the ES, but not all. So, long story short, is it a bad idea to update every document once and should I stick with pre-processing or is it feasible? I'm also open to completely different approaches. Thanks for your input in advance! P -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4172b00d-ea52-4fee-b9c2-6d1ae7af645f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
