I don't crawl the web, just collect rather verbose logs from multiple 
private cloud services and try to keep the size of ES cluster just 
sufficient for comfortable searching those logs. Monitored services are 
under development and occasionally (because of bugs or specifics of the 
source data) they start to send orders of magnitude higher than usual 
torrent of logs. When this happens, very soon ES cluster become 
non-responsive and drops logs from all services, bad behaving or not. 

We cannot afford to keep the cluster of the size capable to handle those 
peak loads (and idling most of the time). We rather need some kind of 
Denial of Service attack prevention logic. When some client(s) goes over 
its quota of logs it should be blocked, rather than melting cluster down.   

River plugin looks like overkill to me, especially considering deprecation 
of rivers. 

On Saturday, December 13, 2014 7:33:05 PM UTC-8, BillyEm wrote:
>
> Why are you putting business logic of this type in ES? It belongs in your 
> workflow. At the ES indexer level you will have no idea of the source of 
> truth of the questionable content. Unless you're web crawliing which means 
> you're using the wrong search platform altogether imo.
>
> On Friday, December 12, 2014 5:11:05 PM UTC-5, Konstantin Erman wrote:
>>
>> I noticed that occasionally I need to shield my ES cluster from some 
>> documents, which are too many or too big or otherwise poison ES. 
>> Usually I can formulate pretty easy query or criteria to detect those 
>> documents and I'm looking for a way to block them from entering the index. 
>>
>> Is there such pre-indexing filtering mechanism? May be Transforms can be 
>> used for that purpose?
>>
>> Thank you!
>> Konstantin
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/26556df6-a2a5-495f-bb23-95b5bd0fa63b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to