Re: Indexing is being throttled

Mark Walkom Thu, 18 Sep 2014 14:29:28 -0700

You'd get a much greater benefit from RAID than you will by using all disks
as individuals.


You can however use multiple mountpoints to store ES data it's just an
array in path.data.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com

On 19 September 2014 00:11, <[email protected]> wrote:

> Unfortunately that is too hard/complicated.
>
> I have now enabled all 12 disks per machine, so going forward I will get
> some "sharing" across all disks. Not sure how it will allocate new data
> across the disks?
>
> If I move a shard from one node to another with the new 12-disk paths,
> will the receiving node "share" the data across the disks? That way I could
> move all shards and get a redistribution of existing data?
>
>
>
> On Thursday, September 18, 2014 10:35:24 AM UTC+1, Mark Walkom wrote:
>>
>> Does your server have hardware RAID capabilities?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: [email protected]
>> web: www.campaignmonitor.com
>>
>> On 18 September 2014 19:30, <[email protected]> wrote:
>>
>>> Good point on heap, so I will bring that back down to 30GB
>>>
>>> Versions:
>>> ES 1.3.2-1
>>> java 1.7.0_67
>>>
>>> I definitely want to start using all 12 disks, rather than the 1 at the
>>> moment! If I add paths for the other 11 disks and restart, will ES do any
>>> 'rebalancing'? If it won't then is there any way to move the data around
>>> all 12 disks? I really don't want to re-index everthing!!
>>>
>>> Thanks
>>>
>>>
>>> On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote:
>>>>
>>>> Also given you're over 32GB heap your java pointers aren't going to be
>>>> compressed, which means GC will suffer.
>>>>
>>>> You haven't mentioned what ES and java versions you are using, which
>>>> would be useful.
>>>>
>>>> Regards,
>>>> Mark Walkom
>>>>
>>>> Infrastructure Engineer
>>>> Campaign Monitor
>>>> email: [email protected]
>>>> web: www.campaignmonitor.com
>>>>
>>>> On 18 September 2014 18:57, Michael McCandless <[email protected]
>>>> > wrote:
>>>>
>>>>> Try disabling merge IO throttling, especially if your index is on
>>>>> SSD/s.  (It's on by default at a paltry 20 MB/sec).  Merge IO throttling
>>>>> causes merges to run slowly which eventually causes them to back up enough
>>>>> to the point where indexing must be throttled...
>>>>>
>>>>> Also see the recent post about tuning to favor indexing throughput:
>>>>> http://www.elasticsearch.org/blog/performance-considerations-
>>>>> elasticsearch-indexing/
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Thu, Sep 18, 2014 at 4:54 AM, <[email protected]> wrote:
>>>>>
>>>>>> Setup:
>>>>>> 4 nodes
>>>>>> Replication            = 0
>>>>>> ES_HEAP_SIZE   = 75GB
>>>>>> Number of Indices = 59  (using logstash one index per month)
>>>>>> Total shards          = 234 (each index is 4 hards, one per node)
>>>>>> Total docs             = 7.4 billion
>>>>>> Total size               = 4.7TB
>>>>>>
>>>>>> When I add a new file, which I do using logstash on all four nodes,
>>>>>> the indexing immediately throttles. For instance:
>>>>>>
>>>>>> [2014-09-18 09:41:42,326][INFO ][index.engine.internal    ] [hdp13] [
>>>>>> logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
>>>>>> maxNumMerges=5
>>>>>> [2014-09-18 09:41:45,267][INFO ][index.engine.internal    ] [hdp13]
>>>>>> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
>>>>>> maxNumMerges=5
>>>>>> [2014-09-18 09:41:45,303][INFO ][index.engine.internal    ] [hdp13]
>>>>>> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
>>>>>> maxNumMerges=5
>>>>>> [2014-09-18 09:41:51,273][INFO ][index.engine.internal    ] [hdp13]
>>>>>> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
>>>>>> maxNumMerges=5
>>>>>> [2014-09-18 09:41:51,379][INFO ][index.engine.internal    ] [hdp13]
>>>>>> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
>>>>>> maxNumMerges=5
>>>>>> [2014-09-18 09:42:06,429][INFO ][index.engine.internal    ] [hdp13]
>>>>>> [logstash-2014.09][2] now t
>>>>>>
>>>>>> Where should I be looking to tuning the indexing performance? The
>>>>>> query load on the cluster is very low as it is a research cluster and so 
>>>>>> I
>>>>>> would sacrifice query performance for indexing.
>>>>>>
>>>>>> The 4 nodes all run logstash, listening one various ports. I use
>>>>>> netcat to 'feed' the data to the 4 nodes from  a hadoop cluster.
>>>>>>
>>>>>> hadoop1 netcat -------->
>>>>>> hadoop2 netcat -------->   ES1
>>>>>> hadoop3 netcat -------->
>>>>>>
>>>>>> And so on.
>>>>>>
>>>>>> Each ES node has 24 disks but I am only using one at the moment. This
>>>>>> is an obvious IO bottleneck, but I am unclear how to use all disks? If I
>>>>>> add more disks with ES share the data between them all? eg; /mnt/disk1
>>>>>> /mnt/disk2 etc
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40goo
>>>>>> glegroups.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_
>>>>> rFan1FP6bDw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1eeb45aa-6957-4046-ae33-00fc4a7df015%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/1eeb45aa-6957-4046-ae33-00fc4a7df015%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bWySbFQOshtOOo%3D2%2BNX_mvOq7%2BTEMNF9n5MMgEQ3qnog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Indexing is being throttled

Reply via email to