It would be a useful information if switching to replica level 0 still yield the same "dropped documents" effect?
Otherwise will refreshing the index change the situation? Jörg On Wed, Jul 2, 2014 at 10:14 PM, Joseph Johnson <[email protected]> wrote: > Hello, > > I am attempting to set up a large scale ELK setup at work. Here is a > basic setup of what we have so far: > > ``` > Nodes (approx 150) > [logstash] > > | > | > +-----------+ > | | > Indexer1 Indexer2 > [Redis] [Redis] > [Logstash] [Logstash] > | | > | | > +----+------+ > | > | > ES Master ---------- Kibana3 > [Master: yes] > [Data: no] > | > | > ES Data (4 data nodes) > [Master: no] > [Data: yes] > ``` > > In case the formatting does not hold with the above, I've created a > paste here: https://baneofswitches.privatepaste.com/c8dfc2c30b > > > The Setup > ========= > > * We have approximately 150 nodes configured to send to a "shuffled" > Redis instance on either Indexer1 or Indexer2. A sanitized version of > the node Logstash config is here: > https://baneofswitches.privatepaste.com/345b94064d > > * Each indexer is identical. They both run their own independent Redis > service. They then each have a Logstash service that pulls events from > Redis and pushes them to the ES Master. They are using the http > protocol. A sanitized version of their config is here: > https://baneofswitches.privatepaste.com/e19eae690f > > * The ES Master is configured to only be a Master, and is not set to be > a data node. It has 32 GB of RAM. > > * There are 4 ES data nodes, configured to be data nodes only, they have > been configured to be ineligible to be elected as Masters. They have 62 > GB RAM and the storage for ES is on SSDs > > * We have Kibana3 configured to search from the ES Master. > > * Average # of logs generated by all nodes total seems to be > approximately 7k/sec, with peaks up to about 16k/s. > > * Indexer throughput seems to be good enough that one indexer can work > just fine during normal usage. > > * We are using the default 5 shards with 1 replica > > > The Problem > =========== > > When this setup is loaded as mentioned above, we are noticing that some > logs are being dropped. We were able to test this by running something > like: > > seq 1 5000 | xargs -I{} -n 1 -P 40 logger "Testing unqString {} of 5000" > > Sometimes we would see all 5000 show up in Kibana, other times a subset > of them (for example 4800 events). > > > Troubleshooting > =============== > > We have taken a number of steps to eliminate possibilities. We have > confirmed that logs are being reliably transferred from nodes to Redis > and from Redis through Logstash. We confirmed this by monitoring counts > over many trials. The Redis-> logstash leg was tested by outputting to a > file and comparing counts. > > That left the Logstash -> ES leg. We tested this by writing a script > that pushed fake events via the bulk API. We were unable to reproduce > the problem with one request. However, when the cluster is under load > (we let 'real' logs flow) and we push via the bulk API with our script > we occasionally see partial loss of data. > > It's important to note that partial loss here means that the request > succeeds (200 return code), and much of the data in the bulk request is > then searchable, however not all will be. For example, if we put the > cluster under load and push a request with a bulk of 5000 events in, we > will see 4968 of the 5000 in our subsequent search. > > We have tried increasing the bulk api threadpool as well as giving a > greater percentage (50%) to the indexing buffer. Neither has fixed the > issue. > > > Conclusion > ======== > > I am looking for feedback on how to troubleshoot this further and find > the cause. I am also looking for information to see if anyone else out > there is getting these sorts of incoming volume and what sorts of things > they had to do to get their setup working. I appreciate all feedback. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/53B46818.7020005%40gmail.com > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEtvWkty%2B9MLCLZrEKDLC-mSgYPsR0jaGsjMh%2B9YF8zXQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
