Re: Indexing is being throttled

bob . webman Thu, 18 Sep 2014 07:12:18 -0700

Unfortunately that is too hard/complicated.

I have now enabled all 12 disks per machine, so going forward I will get 
some "sharing" across all disks. Not sure how it will allocate new data 
across the disks?


If I move a shard from one node to another with the new 12-disk paths, will 
the receiving node "share" the data across the disks? That way I could move 
all shards and get a redistribution of existing data?



On Thursday, September 18, 2014 10:35:24 AM UTC+1, Mark Walkom wrote:
>
> Does your server have hardware RAID capabilities?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: [email protected] <javascript:>
> web: www.campaignmonitor.com
>
> On 18 September 2014 19:30, <[email protected] <javascript:>> wrote:
>
>> Good point on heap, so I will bring that back down to 30GB
>>
>> Versions:
>> ES 1.3.2-1
>> java 1.7.0_67
>>
>> I definitely want to start using all 12 disks, rather than the 1 at the 
>> moment! If I add paths for the other 11 disks and restart, will ES do any 
>> 'rebalancing'? If it won't then is there any way to move the data around 
>> all 12 disks? I really don't want to re-index everthing!!
>>
>> Thanks
>>
>>
>> On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote:
>>>
>>> Also given you're over 32GB heap your java pointers aren't going to be 
>>> compressed, which means GC will suffer.
>>>
>>> You haven't mentioned what ES and java versions you are using, which 
>>> would be useful.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: [email protected]
>>> web: www.campaignmonitor.com
>>>
>>> On 18 September 2014 18:57, Michael McCandless <[email protected]> 
>>> wrote:
>>>
>>>> Try disabling merge IO throttling, especially if your index is on 
>>>> SSD/s.  (It's on by default at a paltry 20 MB/sec).  Merge IO throttling 
>>>> causes merges to run slowly which eventually causes them to back up enough 
>>>> to the point where indexing must be throttled...
>>>>
>>>> Also see the recent post about tuning to favor indexing throughput: 
>>>> http://www.elasticsearch.org/blog/performance-considerations-
>>>> elasticsearch-indexing/
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Thu, Sep 18, 2014 at 4:54 AM, <[email protected]> wrote:
>>>>
>>>>> Setup:
>>>>> 4 nodes
>>>>> Replication            = 0
>>>>> ES_HEAP_SIZE   = 75GB
>>>>> Number of Indices = 59  (using logstash one index per month)
>>>>> Total shards          = 234 (each index is 4 hards, one per node)
>>>>> Total docs             = 7.4 billion
>>>>> Total size               = 4.7TB
>>>>>
>>>>> When I add a new file, which I do using logstash on all four nodes, 
>>>>> the indexing immediately throttles. For instance:
>>>>>
>>>>> [2014-09-18 09:41:42,326][INFO ][index.engine.internal    ] [hdp13] [
>>>>> logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, 
>>>>> maxNumMerges=5
>>>>> [2014-09-18 09:41:45,267][INFO ][index.engine.internal    ] [hdp13] 
>>>>> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6, 
>>>>> maxNumMerges=5
>>>>> [2014-09-18 09:41:45,303][INFO ][index.engine.internal    ] [hdp13] 
>>>>> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, 
>>>>> maxNumMerges=5
>>>>> [2014-09-18 09:41:51,273][INFO ][index.engine.internal    ] [hdp13] 
>>>>> [logstash-2014.09][2] now throttling indexing: numMergesInFlight=6, 
>>>>> maxNumMerges=5
>>>>> [2014-09-18 09:41:51,379][INFO ][index.engine.internal    ] [hdp13] 
>>>>> [logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4, 
>>>>> maxNumMerges=5
>>>>> [2014-09-18 09:42:06,429][INFO ][index.engine.internal    ] [hdp13] 
>>>>> [logstash-2014.09][2] now t
>>>>>
>>>>> Where should I be looking to tuning the indexing performance? The 
>>>>> query load on the cluster is very low as it is a research cluster and so 
>>>>> I 
>>>>> would sacrifice query performance for indexing.
>>>>>
>>>>> The 4 nodes all run logstash, listening one various ports. I use 
>>>>> netcat to 'feed' the data to the 4 nodes from  a hadoop cluster.
>>>>>
>>>>> hadoop1 netcat -------->
>>>>> hadoop2 netcat -------->   ES1     
>>>>> hadoop3 netcat -------->
>>>>>
>>>>> And so on.
>>>>>
>>>>> Each ES node has 24 disks but I am only using one at the moment. This 
>>>>> is an obvious IO bottleneck, but I am unclear how to use all disks? If I 
>>>>> add more disks with ES share the data between them all? eg; /mnt/disk1 
>>>>> /mnt/disk2 etc
>>>>>
>>>>> Thanks
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_
>>>> rFan1FP6bDw%40mail.gmail.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1eeb45aa-6957-4046-ae33-00fc4a7df015%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Indexing is being throttled

Reply via email to