Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

[email protected] Fri, 12 Sep 2014 11:26:31 -0700

Yes, 2 servers are not enough from a fault tolerance perspective.

It is hard to find out why your ES cluster runs slow without information.
Maybe a few settings changes is all you need, I do not know. Maybe you can
find out from the logs what to do.


For sizing an ELK stack, there are many hints on the net, the best are
available from the company.

I hesitate to recommend anything on AWS service or elsewhere. Personally I
am in the situation that I use bare metal server in my own data center with
the specifications I want.

In the end, it is up to you to decide if you go the path "many servers with
less power" or "few servers with much power". This might also not be a
technical issue but also a strategic question.

As Mark said, Elasticsearch was designed to scale out. This means, you can
add servers very easily, and this improves the capacity and power of the
overall system. For many it is enough to add nodes and see the problems go
away, without thinking hard about the reasons.

Jörg



On Fri, Sep 12, 2014 at 1:51 PM, Pavel P <[email protected]> wrote:

> 2Jörg
>
> 1. How I decided that 3 is enough.
> I've started from 2 nodes in the cluster. And it was not able to manage
> the index load.
> Then, in this conversation,
> https://groups.google.com/forum/#!topic/elasticsearch/7XHQjAoKPfw, you
> explained me that 2 nodes cluster is not a cluster. So I went to 3 nodes.
> The indexing process now goes smoothly and I'm satisfied by it.
> 2. The maximum capacity = the size of the data? The data is spreaded
> equally, each has ~500Gb. The shards are located not equally of course,
> because it' shard to split 5 shards between servers.
> 3.
>
>>  if your requirements allow that according to the data patterns and the
>> search load, but not with the ES OOTB settings
>
>
> what is the ES OOTB?
>
> The main target of our cluster is to save all the logs from our internal
> aplications and then allow us to search through them, and, do some
> analytics using Kibana.
> The search load currently is something about 0, because as soon I'm trying
> to search it works quite slow, and when I'm trying to aggregate the values
> - I even have my cluster down.
>
> What is your view on this issue, Jörg, should we go to the 10 small
> servers, rather then 3 big ones?
>
> Regards,
>
> On Friday, September 12, 2014 2:43:15 PM UTC+3, Jörg Prante wrote:
>>
>> Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you
>> have a sort of "impedance mismatch" because 5 (or 10 with replica) shards
>> do not distribute equally on 3 nodes.
>>
>> Rule: use a shard count that is always a factor of the node, e.g. 3, 6,
>> 9, 12 .... for 3 nodes.
>>
>> Can you tell what the maximum capacity of a single node is for your
>> installation? Somehow you must have concluded that 3 nodes are sufficient -
>> how did you do that? It does not only depend on observing index size. You
>> can even run 1.5TB index on a single node, if your requirements allow that
>> according to the data patterns and the search load, but not with the ES
>> OOTB settings, which is for development installations.
>>
>> Also note that Kibana is great but I have the impression (I do not use
>> it) that many queries from the UI are not optimized regarding filter caches
>> and tend to waste resources. There is much space left for improvement.
>>
>> Jörg
>>
>> On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom <[email protected]>
>> wrote:
>>
>>> As I initially mentioned, it all depends on your use case but generally
>>> ES does scale better horizontally rather than vertically. If you can, spin
>>> up another cluster along side the one you have and then replica the data
>>> set and query usage and compare the performance.
>>>
>>> Ideally you should aim for one primary shard per node but you can over
>>> allocate if you expect to grow - ie create 6 shards if you expect to grow
>>> to 6 servers. This applies on larger clusters as well, to a point.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: [email protected]
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 12 September 2014 19:24, Pavel P <[email protected]> wrote:
>>>
>>>> Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
>>>> 75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
>>>> RAM (in total 24 CPU and 90RAM) ?
>>>> Assuming that the information would be spread there equally.
>>>>
>>>> btw, what about the shards allocation. Currently I use the default one
>>>> 5 shards and 1 replica. Could this be a potential thing to optimisation?
>>>> How the shards scheme should look on the cluster with the bigger number
>>>> of the nodes?
>>>>
>>>> Regards,
>>>>
>>>> On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:
>>>>>
>>>>> The answer is it depends on what sort of use case you have.
>>>>> But if you are experiencing problems like you are then usually it's
>>>>> due to the cluster being at capacity and needing more resources.
>>>>>
>>>>> You may find it cheaper to move to more numerous and smaller nodes
>>>>> that you can distribute the load across, as that is where ES excels and
>>>>> also how many other big data platforms operate.
>>>>>
>>>>> Regards,
>>>>> Mark Walkom
>>>>>
>>>>> Infrastructure Engineer
>>>>> Campaign Monitor
>>>>> email: [email protected]
>>>>> web: www.campaignmonitor.com
>>>>>
>>>>>
>>>>> On 12 September 2014 19:01, Pavel P <[email protected]> wrote:
>>>>>
>>>>>> Java version is "1.7.0_55"
>>>>>> Elasticsearch is 1.3.1
>>>>>>
>>>>>> Well, the cost of the whole setup is the question.
>>>>>> currently it's something about 1000$ per month on AWS. Do we really
>>>>>> need to pay a lot more then 1000$/month to support the 1.5Tb data?
>>>>>>
>>>>>> Could you briefly describe how much nodes do you expect to handle
>>>>>> that much of data?
>>>>>>
>>>>>> The side question is, how the the really Big Data solution works,
>>>>>> when they do the search or aggregation from the data which size is far 
>>>>>> more
>>>>>> then 1.5Tb? Or it's as well is the size of the architecture.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>>>>>>>
>>>>>>> That's a lot of data for 3 nodes!
>>>>>>> You really need to adjust your infrastructure; add more nodes, more
>>>>>>> ram, or alternatively remove some old indexes (delete or close).
>>>>>>>
>>>>>>> What ES and java version are you running?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Mark Walkom
>>>>>>>
>>>>>>> Infrastructure Engineer
>>>>>>> Campaign Monitor
>>>>>>> email: [email protected]
>>>>>>> web: www.campaignmonitor.com
>>>>>>>
>>>>>>>
>>>>>>> On 12 September 2014 18:48, Pavel P <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Again I have an issue with the power of the cluster.
>>>>>>>>
>>>>>>>> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb
>>>>>>>> disk attached.
>>>>>>>>
>>>>>>>>
>>>>>>>> <https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png>
>>>>>>>>
>>>>>>>>
>>>>>>>> There are 1323957069 docs (1.64TB) there, the documents
>>>>>>>> distribution is the next:
>>>>>>>>
>>>>>>>>
>>>>>>>> <https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png>
>>>>>>>>
>>>>>>>> All the 3 nodes are data nodes.
>>>>>>>>
>>>>>>>> The index throughput is something about 10-20k documents per
>>>>>>>> minute. (it's the logstash -> elasticsearch setup, we store different 
>>>>>>>> logs
>>>>>>>> in the cluster)
>>>>>>>>
>>>>>>>> My concerns are the next:
>>>>>>>>
>>>>>>>> 1. When I load the index page of kibana - the loading of the
>>>>>>>> document types panel takes about a minute. It that ok?
>>>>>>>> 2. For the document type user_account, when I try to build the
>>>>>>>> terms panel for the field "message.raw" (the string of 20-30 
>>>>>>>> characters).
>>>>>>>> My cluster stucks.
>>>>>>>> In the logs I can find the next
>>>>>>>>
>>>>>>>> [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker]
>>>>>>>>> [morbius] New used memory 6499531395 [6gb] from field [message.raw] 
>>>>>>>>> would
>>>>>>>>> be larger than configured breaker: 6414558822 [5.9gb], breaking
>>>>>>>>
>>>>>>>>
>>>>>>>> But, despite of the breakers, when it tries to calculate that terms
>>>>>>>> pie, it stops indexing the input documents. The queue goes up. Then, it
>>>>>>>> happens that I see the heap exceptions and to solve them the only 
>>>>>>>> thing I
>>>>>>>> could do is to reboot the cluster.
>>>>>>>>
>>>>>>>> *My question is the next:*
>>>>>>>>
>>>>>>>> It looks like I have quite powerful servers and the correct
>>>>>>>> configuration (my ES_HEAP_SIZE is set to 15g), while they are
>>>>>>>> still not able to process the 1.5Tb of information or doing that quite
>>>>>>>> slowly.
>>>>>>>> Do you have any advice of how to overcome that and make my cluster
>>>>>>>> to response more fast? How should I adjust the infrastructure?
>>>>>>>>
>>>>>>>> Which hardware should I need to manipulate the 1.5Tb in the
>>>>>>>> reasonable amount of time?
>>>>>>>>
>>>>>>>> Any thoughts are welcome.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f9
>>>>>>>> 4-48cc-a78a-0e1f63f32b8d%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40goo
>>>>>> glegroups.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyu
>>> JcYYFUw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c029f478-230d-4bb2-bd13-83bfa5a3a39f%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/c029f478-230d-4bb2-bd13-83bfa5a3a39f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGd%3DRmRUO68hbOQ%2BEmObzsTEoPDFEkPwmDOEwQT5gFO-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Reply via email to