Re: Extremly slow troughput on large index

Yitzhak Kesselman Wed, 09 Apr 2014 22:14:29 -0700

I am using logstash 1.4.0, if I had understand correctly it uses 
automatically the Bulk API. Do I miss something ?
Is there a limit on the size of an Index (on single node machine)?



(BTW Itamar thanks for the help!)


On Wednesday, April 9, 2014 10:39:19 PM UTC+3, Itamar Syn-Hershko wrote:

> Can you try batching writes to Elasticsearch? see 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
>
> On Wed, Apr 9, 2014 at 10:33 PM, Yitzhak Kesselman 
> <[email protected]<javascript:>
> > wrote:
>
>> Attached the index rate (using bigdesk):
>> <https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png>
>>
>> The indexing requests per second is around 2K and the Indexing time per 
>> second is around 3K
>>
>>
>> On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote:
>>
>>> Answers inline.
>>>
>>>  Regarding the slow I/O. When I analyzed the creation of the Lucene 
>>> index files I see that they are created without any special flags (such as 
>>> no buffering or write through). This means that we’re paying costs twice – 
>>> when we write the file we’re going cache data in Windows’ Cache Manager, 
>>> which takes a lot of memory (which is then not available to the application 
>>> itself) but when we read the file we don’t actually read it using the 
>>> cache, which makes the operation slow. *Any ideas?*
>>>
>>>
>>> On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote:
>>>
>>>> Shooting in the dark here, but here it goes:
>>>>
>>>> 1. Do you have anything else running on the system? for example AVs are 
>>>> known to cause slow-downs for such services, and other I/O or memory heavy 
>>>> services could cause thrashing or just general slowdown
>>>>
>>> No, nothing else is running on that machine. Initially it was working 
>>> fast it got slower with that amount of data that in index. Moreover is 
>>> there a way to increase buffer size for the Lucene index files (.tim, .doc, 
>>> and .pos) from 8K to something much bigger.
>>>
>>>>
>>>> 2. What JVM version are you running this with?
>>>>
>>>  java version "1.7.0_51"
>>>
>>>  Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>>>
>>>  Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>>
>>>  OS_NAME="Windows"
>>>
>>> OS_VERSION="5.2"
>>>
>>> OS_ARCH="amd64"
>>>
>>> 3. If you changed any of the default settings for merge factors etc - 
>>>> can you revert that and try again?
>>>>
>>>  Tried before was same behavior.
>>>
>>>>
>>>> 4. Can you try with embedded=false and see if it makes a difference?
>>>>
>>>  Tried before was same behavior.
>>>
>>>>
>>>> --
>>>>
>>>> Itamar Syn-Hershko
>>>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>>>> Freelance Developer & Consultant
>>>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman 
>>>> <[email protected]>wrote:
>>>>
>>>>>  Hi,
>>>>>
>>>>>  
>>>>>
>>>>> I have configured a single node ES with logstash 1.4.0 
>>>>> (8GB memory) with the following configuration:
>>>>>
>>>>>    - 
>>>>>    
>>>>>    index.number_of_shards: 7
>>>>>    - 
>>>>>    
>>>>>    number_of_replicas: 0
>>>>>    - 
>>>>>    
>>>>>    refresh_interval: -1
>>>>>    - 
>>>>>    
>>>>>    translog.flush_threshold_ops: 100000
>>>>>    - 
>>>>>    
>>>>>    merge.policy.merge_factor: 30
>>>>>    - 
>>>>>    
>>>>>    codec.bloom.load: false
>>>>>    - 
>>>>>    
>>>>>    min_shard_index_buffer_size: 12m
>>>>>    - 
>>>>>    
>>>>>    compound_format : true
>>>>>    - 
>>>>>    
>>>>>    indices.fielddata.cache.size: 15%
>>>>>    - 
>>>>>    
>>>>>    indices.fielddata.cache.expire: 5m
>>>>>    - 
>>>>>    
>>>>>    indices.cache.filter.size: 15%
>>>>>    - 
>>>>>    
>>>>>    indices.cache.filter.expire: 5m
>>>>>    
>>>>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ.
>>>>> OS: 64bit WindowsServer 2012 R2
>>>>>
>>>>> My raw data is CSV file and I use grok as a filter to parse it with 
>>>>> output configuration (elasticsearch {  embedded => true flush_size => 
>>>>> 100000  idle_flush_time => 30 }).
>>>>> Row data size is about 100GB events per day which ES tries to input 
>>>>> into one index (with 7 shards).
>>>>>
>>>>> At the beginning the insert was fast however after a while 
>>>>> it's got extremely slow,  1.5K doc in 8K seconds :(
>>>>>
>>>>> Currently the index has around 140Million docs with size of 55GB.
>>>>>
>>>>>  
>>>>>
>>>>> When I have analyzed the write to the disk with ProcMon I have seen 
>>>>> that the process is writing in an interleaved manner to three kinds of 
>>>>> files (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching 
>>>>> writes to some reasonable number.
>>>>>
>>>>>  
>>>>>
>>>>> Appreciate the help.
>>>>>  
>>>>>  
>>>>>
>>>>> All the best,
>>>>>
>>>>> Yitzhak
>>>>>  
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%
>>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1a97a59e-aeec-4b3a-bbb7-5e3df09ed3ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Extremly slow troughput on large index

Reply via email to