Can you try batching writes to Elasticsearch? see
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Author of RavenDB in Action <http://manning.com/synhershko/>


On Wed, Apr 9, 2014 at 10:33 PM, Yitzhak Kesselman <[email protected]>wrote:

> Attached the index rate (using bigdesk):
>
> <https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png>
>
> The indexing requests per second is around 2K and the Indexing time per
> second is around 3K
>
>
> On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote:
>
>> Answers inline.
>>
>> Regarding the slow I/O. When I analyzed the creation of the Lucene index
>> files I see that they are created without any special flags (such as no
>> buffering or write through). This means that we're paying costs twice -
>> when we write the file we're going cache data in Windows' Cache Manager,
>> which takes a lot of memory (which is then not available to the application
>> itself) but when we read the file we don't actually read it using the
>> cache, which makes the operation slow. *Any ideas?*
>>
>>
>> On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote:
>>
>>> Shooting in the dark here, but here it goes:
>>>
>>> 1. Do you have anything else running on the system? for example AVs are
>>> known to cause slow-downs for such services, and other I/O or memory heavy
>>> services could cause thrashing or just general slowdown
>>>
>> No, nothing else is running on that machine. Initially it was working
>> fast it got slower with that amount of data that in index. Moreover is
>> there a way to increase buffer size for the Lucene index files (.tim, .doc,
>> and .pos) from 8K to something much bigger.
>>
>>>
>>> 2. What JVM version are you running this with?
>>>
>>  java version "1.7.0_51"
>>
>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>>
>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>
>>  OS_NAME="Windows"
>>
>> OS_VERSION="5.2"
>>
>> OS_ARCH="amd64"
>>
>> 3. If you changed any of the default settings for merge factors etc - can
>>> you revert that and try again?
>>>
>>  Tried before was same behavior.
>>
>>>
>>> 4. Can you try with embedded=false and see if it makes a difference?
>>>
>>  Tried before was same behavior.
>>
>>>
>>> --
>>>
>>> Itamar Syn-Hershko
>>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>>> Freelance Developer & Consultant
>>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>>
>>>
>>> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman <[email protected]>wrote:
>>>
>>>>  Hi,
>>>>
>>>>
>>>>
>>>> I have configured a single node ES with logstash 1.4.0
>>>> (8GB memory) with the following configuration:
>>>>
>>>>    -
>>>>
>>>>    index.number_of_shards: 7
>>>>    -
>>>>
>>>>    number_of_replicas: 0
>>>>    -
>>>>
>>>>    refresh_interval: -1
>>>>    -
>>>>
>>>>    translog.flush_threshold_ops: 100000
>>>>    -
>>>>
>>>>    merge.policy.merge_factor: 30
>>>>    -
>>>>
>>>>    codec.bloom.load: false
>>>>    -
>>>>
>>>>    min_shard_index_buffer_size: 12m
>>>>    -
>>>>
>>>>    compound_format : true
>>>>    -
>>>>
>>>>    indices.fielddata.cache.size: 15%
>>>>    -
>>>>
>>>>    indices.fielddata.cache.expire: 5m
>>>>    -
>>>>
>>>>    indices.cache.filter.size: 15%
>>>>    -
>>>>
>>>>    indices.cache.filter.expire: 5m
>>>>
>>>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ.
>>>> OS: 64bit WindowsServer 2012 R2
>>>>
>>>> My raw data is CSV file and I use grok as a filter to parse it with
>>>> output configuration (elasticsearch {  embedded => true flush_size =>
>>>> 100000  idle_flush_time => 30 }).
>>>> Row data size is about 100GB events per day which ES tries to input
>>>> into one index (with 7 shards).
>>>>
>>>> At the beginning the insert was fast however after a while
>>>> it's got extremely slow,  1.5K doc in 8K seconds :(
>>>>
>>>> Currently the index has around 140Million docs with size of 55GB.
>>>>
>>>>
>>>>
>>>> When I have analyzed the write to the disk with ProcMon I have seen
>>>> that the process is writing in an interleaved manner to three kinds of
>>>> files (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching
>>>> writes to some reasonable number.
>>>>
>>>>
>>>>
>>>> Appreciate the help.
>>>>
>>>>
>>>>
>>>> All the best,
>>>>
>>>> Yitzhak
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%
>>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvQcxHRMbZdf%3Di1-EmoP1Dq3BxKW%2Bau0r0Ox8fbJSRo%2Bw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to