Re: Elasticsearch configuration for uninterrupted indexing

Ivan Brusic Fri, 21 Mar 2014 10:07:37 -0700

One of the main usage of having a data-less node is that it would act as a
coordinator between the other nodes. It will gather all the responses from
the other nodes/shards and reduce them into one.


In your case, the data-less node is gathering all the data from just one
node. In other words, it is not doing much since the reduce phase is
basically a pass-thru operation. With a two node cluster, I would say you
are better off having both machines act as full nodes.

Cheers,

Ivan



On Fri, Mar 21, 2014 at 5:04 AM, Rujuta Deshpande <[email protected]> wrote:

> Hi,
>
> I am setting up a system consisting of elasticsearch-logstash-kibana for
> log analysis. I am using one machine (2 GB RAM, 2 CPUs) running logstash,
> kibana and  two instances of elasticsearch. Two other machines, each
> running  logstash-forwarder are pumping logs into the ELK system.
>
> The reasoning behind using two ES instances was this - I needed one
> uninterrupted instance to index the incoming logs and I also needed to
> query the currently existing indices. However, I didn't want any complex
> querying to result in loss of events owing to Out of Memory Errors because
> of excessive querying.
>
> So, one elasticsearch node was master = true  and data = true which did
> the indexing (called the writer node) and the other node, was master =
> false and data = false (this was the workhorse or reader node) .
>
> I assumed that, in cases of excessive querying, although the data is
> stored on the writer node, the reader node will query the data and all the
> processing will take place on the reader as a result of which issues like
> out of memory error etc will be avoided and uninterrupted indexing will
> take place.
>
> However, while testing this, I realized that the reader hardly uses the
> heap memory ( Checked this in Marvel )  and when I fire a complex search
> query - which was a search request using the python API where the 'size'
> parameter was set to 10000, the writer node throws an out of memory error,
> indicating that the processing also takes place on the writer node only. My
> min and max heap size was set to 256m  for this test. I also ensured that I
> was firing the search query to the port on which the reader node was
> listening (Port 9200). The writer node was running on Port 9201.
>
> Was my previous understanding of the problem incorrect - i.e. having one
> reader and one writer node, doesn't help in uninterrupted indexing of
> documents? If this is so, what is the use of having a separate workhorse or
> reader node?
>
> My eventual aim is to be able to query elasticsearch and fetch large
> amounts of data at a time without interrupting/slowing down the indexing of
> documents.
>
> Thank you.
>
> Rujuta
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a8fcd5f0-447a-4654-9115-9bc4e524b246%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD25ipp5UFihLDqcqxqr1_4nMvngsNmedA73gLfjG_rcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Elasticsearch configuration for uninterrupted indexing

Reply via email to