OK, thanks for bringing closure.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jul 17, 2014 at 9:02 AM, Marek Dabrowski <[email protected]>
wrote:

> Hello
>
> I found reason my problems.
> Refresh index during usage perl depend on parameters "max_count" and
> "max_size" for
> $e->bulk_helper
> Values for this parameters determine when refresh will be done on index.
>
> Tnx for help.
>
> Regards
>
>
> W dniu czwartek, 17 lipca 2014 09:59:55 UTC+2 użytkownik Marek Dabrowski
> napisał:
>
>> Hello Mike
>>
>> My ES version is 1.2.1
>> I checked utilization nodes my cluster. Common valus ofr all nodes are:
>> java proces cpu utilization: < 6%
>> os load: < 1
>> io stat: < 15kB/s write
>>
>> I checked indexing process 2 methods:
>> a) indexing by native json data (13GB splited to 100MB chunks)
>> time for i in /tmp/SMT* ; do echo $i; curl -s -XPOST
>> h3:9200/smt_20140501_bulk_json_refresh_600/num/_bulk --data-binary @$i ;
>> rm -f $i; done
>>
>> b) indexing csv data by use perl script
>>
>> my $e = Search::Elasticsearch->new(
>>                nodes => [
>>                    'h3:9200',
>>                ]
>>            );
>>
>>
>> my $bulk = $e->bulk_helper(
>>     index => $idx_name,
>>     type  => $idx_type,
>>     max_count => 10000
>> );
>>
>> open(my $DATA, '<', $data_file) or die $!;
>> while(<$DATA>) {
>>     chomp;
>>
>>     my @data = split(',', $_);
>>     $bulk->index({ source => {
>>                                 p0  => $data[0],
>>                                 p1  => $data[1],
>>                                 p2  => $data[2],
>>                                 p3  => $data[3],
>>                                 p4  => $data[4],
>>                                 p5  => $data[5],
>>                                 p6  => $data[6],
>>                                 p7  => $data[7],
>>                                 p8  => $data[8],
>>                                 p9  => $data[9],
>>                                 p10 => $data[10],
>>                                 p11 => $data[11]
>>                 }});
>>
>> }
>> close($DATA);
>> $bulk->flush;
>>
>> Setting refresh_interval to 600s in both cases has no effect. Data are
>> available immediately. I expect (equal to ES documentation) that new data
>> will be available after 10 minutes and in consequently indexing process
>> will be quicker but it doesn’t.
>>
>> Regards
>>
>> W dniu środa, 16 lipca 2014 16:52:31 UTC+2 użytkownik Michael McCandless
>> napisał:
>>>
>>> Which ES version are you using?  You should use the latest (soon to be
>>> 1.3): there have been a number of bulk-indexing improvements recently.
>>>
>>> Are you using the bulk API with multiple/async client threads?  Are you
>>> saturating either CPU or IO in your cluster (so that the test is really a
>>> full cluster capacity test)?
>>>
>>> Also, the relationship between refresh_interval and indexing performance
>>> is tricky: it turns out, -1 is often a poor choice, because it means your
>>> bulk indexing threads are sometimes tied up flushing segments when with
>>> refreshing enabled, it's a separate thread that does that.  So a refresh of
>>> 5s is maybe a good choice.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Wed, Jul 16, 2014 at 6:51 AM, Marek Dabrowski <[email protected]>
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> My configuration is:
>>>> 6 nodes Elasticsearch cluster
>>>> OS: Centos 6.5
>>>> JVM: 1.7.0_25
>>>>
>>>> Cluster is working fine. I can indexing data, query, etc. Now I'm doing
>>>> test on package about ~50mln doc (~13GB). I would like take better
>>>> performance during indexing data. To take this target I has been changed
>>>> parameter refresh_interval. I did test for 1s, -1 and 600s. Time for
>>>> indexing data is that same. I checked configuration (_settings) for index
>>>> and value for refresh_interval is ok (has proper value), eg:
>>>>
>>>> {
>>>>   "smt_20140501_100000_20g_norefresh" : {
>>>>     "settings" : {
>>>>       "index" : {
>>>>         "uuid" : "q3imiZGQTDasQUuMWS8oiw",
>>>>         "number_of_replicas" : "1",
>>>>         "number_of_shards" : "6",
>>>>         "refresh_interval" : "600s",
>>>>         "version" : {
>>>>           "created" : "1020199"
>>>>         }
>>>>       }
>>>>     }
>>>>   }
>>>> }
>>>>
>>>>
>>>>
>>>> Create index, setting refresh_interval and load is done on that same
>>>> cluster node. Before test index is deleted and created again before start
>>>> new test with new value of refresh_interval. All cluster nodes logs
>>>> information that parameter has been changed, eg:
>>>> [2014-07-16 11:24:09,813][INFO ][index.shard.service      ] [h6]
>>>> [smt_20140501_100000_20g_norefresh][1] updating refresh_interval from
>>>> [1s] to [-1]
>>>> or
>>>> [2014-07-16 11:32:32,928][INFO ][index.shard.service      ] [h6]
>>>> [smt_20140501_100000_20g_norefresh][1] updating refresh_interval from
>>>> [1s] to [10m]
>>>>
>>>> After start test new data are available immediately and indexing time
>>>> that same in 3 cases. I don't know where is failure. Somebody know what is
>>>> going on?
>>>>
>>>> Regards
>>>> Marek
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/f7565c36-98c7-4e3e-8132-796f9edfb3fa%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/f7565c36-98c7-4e3e-8132-796f9edfb3fa%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/fb81cccc-d826-46d0-b37f-ca63e74093d2%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/fb81cccc-d826-46d0-b37f-ca63e74093d2%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc85-8vuZdNzwYP4mbsba7SBDHA2whdGuyaj0%2BLLG__hQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to