It would be a useful information if switching to replica level 0 still
yield the same "dropped documents" effect?

Otherwise will refreshing the index change the situation?

Jörg


On Wed, Jul 2, 2014 at 10:14 PM, Joseph Johnson <[email protected]>
wrote:

> Hello,
>
> I am attempting to set up a large scale ELK setup at work. Here is a
> basic setup of what we have so far:
>
> ```
> Nodes (approx 150)
> [logstash]
>
>   |
>   |
>   +-----------+
>   |           |
> Indexer1     Indexer2
> [Redis]      [Redis]
> [Logstash]   [Logstash]
>   |           |
>   |           |
>   +----+------+
>        |
>        |
>      ES Master ---------- Kibana3
>      [Master: yes]
>      [Data: no]
>        |
>        |
>      ES Data (4 data nodes)
>      [Master: no]
>      [Data: yes]
> ```
>
> In case the formatting does not hold with the above, I've created a
> paste here: https://baneofswitches.privatepaste.com/c8dfc2c30b
>
>
> The Setup
> =========
>
> * We have approximately 150 nodes configured to send to a "shuffled"
> Redis instance on either Indexer1 or Indexer2. A sanitized version of
> the node Logstash config is here:
> https://baneofswitches.privatepaste.com/345b94064d
>
> * Each indexer is identical. They both run their own independent Redis
> service. They then each have a Logstash service that pulls events from
> Redis and pushes them to the ES Master. They are using the http
> protocol. A sanitized version of their config is here:
> https://baneofswitches.privatepaste.com/e19eae690f
>
> * The ES Master is configured to only be a Master, and is not set to be
> a data node. It has 32 GB of RAM.
>
> * There are 4 ES data nodes, configured to be data nodes only, they have
> been configured to be ineligible to be elected as Masters. They have 62
> GB RAM and the storage for ES is on SSDs
>
> * We have Kibana3 configured to search from the ES Master.
>
> * Average # of logs generated by all nodes total seems to be
> approximately 7k/sec, with peaks up to about 16k/s.
>
> * Indexer throughput seems to be good enough that one indexer can work
> just fine during normal usage.
>
> * We are using the default 5 shards with 1 replica
>
>
> The Problem
> ===========
>
> When this setup is loaded as mentioned above, we are noticing that some
> logs are being dropped. We were able to test this by running something
> like:
>
> seq 1 5000 | xargs -I{} -n 1 -P 40 logger "Testing unqString {} of 5000"
>
> Sometimes we would see all 5000 show up in Kibana, other times a subset
> of them (for example 4800 events).
>
>
> Troubleshooting
> ===============
>
> We have taken a number of steps to eliminate possibilities. We have
> confirmed that logs are being reliably transferred from nodes to Redis
> and from Redis through Logstash. We confirmed this by monitoring counts
> over many trials. The Redis-> logstash leg was tested by outputting to a
> file and comparing counts.
>
> That left the Logstash -> ES leg. We tested this by writing a script
> that pushed fake events via the bulk API. We were unable to reproduce
> the problem with one request. However, when the cluster is under load
> (we let 'real' logs flow) and we push via the bulk API with our script
> we occasionally see partial loss of data.
>
> It's important to note that partial loss here means that the request
> succeeds (200 return code), and much of the data in the bulk request is
> then searchable, however not all will be. For example, if we put the
> cluster under load and push a request with a bulk of 5000 events in, we
> will see 4968 of the 5000 in our subsequent search.
>
> We have tried increasing the bulk api threadpool as well as giving a
> greater percentage (50%) to the indexing buffer. Neither has fixed the
> issue.
>
>
> Conclusion
> ========
>
> I am looking for feedback on how to troubleshoot this further and find
> the cause. I am also looking for information to see if anyone else out
> there is getting these sorts of incoming volume and what sorts of things
> they had to do to get their setup working. I appreciate all feedback.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/53B46818.7020005%40gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEtvWkty%2B9MLCLZrEKDLC-mSgYPsR0jaGsjMh%2B9YF8zXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to