Hello,

I'm not quite familiar with the ways ElasticSearch and Lucene works, but I 
know that there isn't really such a thing as updating a document. "Update" 
request are completed by a delete and an insert. So I assume it is "not as 
efficient", say, as an SQL update. So I've been wondering if it's a bad 
idea to update every single document that get indexed, exactly once. Now, 
how did I even came up with that idea and why would I do that? 

Well, we use the ELK stack for processing, storing and presenting logs, 
nothing special here, the most typical use-case. But now, we'd like to do 
some post-processing/analyzing on the logs. We'd like to "highlight" the 
logs that are actually important. Meaning: Fetch not-yet analyzed 
documents, do a lot of regexp matching, and add an "alert" tag if 
necessary, and then push it back to ES. Now, I think I can get Logstash to 
do this (with some modifications to it). But I don't know how hard would it 
be on the "cluster" (only one beefy node so far, but a lot of logs), how 
would it affect the performance, is it error prone and how would it scale? 
Let's assume the use of the most efficient methods (scan+scroll API and 
bulk API for inserts [or are there better APIs for this?]).

I'm well aware that technically I could do this before the first indexing 
of the documents, but in our case I think it's a lesser architecture 
design: Mixing log processing (~splitting) and analyzing is not a great 
practice. The current processing mechanism/logic is rarely modified, it 
works, it's solid. Now, the analyzer patterns would be changed and updated 
a lot, meaning a lot of Logstash restarts. If something get's messed up 
logs may not get processed well, or get lost, etc. Some of these issues 
could be eliminated by chaining two Logstash instances before the ES, but 
not all.

So, long story short, is it a bad idea to update every document once and 
should I stick with pre-processing or is it feasible? 

I'm also open to completely different approaches. 

Thanks for your input in advance!

P

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4172b00d-ea52-4fee-b9c2-6d1ae7af645f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to