OK, thanks for bringing closure. Mike McCandless
http://blog.mikemccandless.com On Thu, Jul 17, 2014 at 9:02 AM, Marek Dabrowski <[email protected]> wrote: > Hello > > I found reason my problems. > Refresh index during usage perl depend on parameters "max_count" and > "max_size" for > $e->bulk_helper > Values for this parameters determine when refresh will be done on index. > > Tnx for help. > > Regards > > > W dniu czwartek, 17 lipca 2014 09:59:55 UTC+2 użytkownik Marek Dabrowski > napisał: > >> Hello Mike >> >> My ES version is 1.2.1 >> I checked utilization nodes my cluster. Common valus ofr all nodes are: >> java proces cpu utilization: < 6% >> os load: < 1 >> io stat: < 15kB/s write >> >> I checked indexing process 2 methods: >> a) indexing by native json data (13GB splited to 100MB chunks) >> time for i in /tmp/SMT* ; do echo $i; curl -s -XPOST >> h3:9200/smt_20140501_bulk_json_refresh_600/num/_bulk --data-binary @$i ; >> rm -f $i; done >> >> b) indexing csv data by use perl script >> >> my $e = Search::Elasticsearch->new( >> nodes => [ >> 'h3:9200', >> ] >> ); >> >> >> my $bulk = $e->bulk_helper( >> index => $idx_name, >> type => $idx_type, >> max_count => 10000 >> ); >> >> open(my $DATA, '<', $data_file) or die $!; >> while(<$DATA>) { >> chomp; >> >> my @data = split(',', $_); >> $bulk->index({ source => { >> p0 => $data[0], >> p1 => $data[1], >> p2 => $data[2], >> p3 => $data[3], >> p4 => $data[4], >> p5 => $data[5], >> p6 => $data[6], >> p7 => $data[7], >> p8 => $data[8], >> p9 => $data[9], >> p10 => $data[10], >> p11 => $data[11] >> }}); >> >> } >> close($DATA); >> $bulk->flush; >> >> Setting refresh_interval to 600s in both cases has no effect. Data are >> available immediately. I expect (equal to ES documentation) that new data >> will be available after 10 minutes and in consequently indexing process >> will be quicker but it doesn’t. >> >> Regards >> >> W dniu środa, 16 lipca 2014 16:52:31 UTC+2 użytkownik Michael McCandless >> napisał: >>> >>> Which ES version are you using? You should use the latest (soon to be >>> 1.3): there have been a number of bulk-indexing improvements recently. >>> >>> Are you using the bulk API with multiple/async client threads? Are you >>> saturating either CPU or IO in your cluster (so that the test is really a >>> full cluster capacity test)? >>> >>> Also, the relationship between refresh_interval and indexing performance >>> is tricky: it turns out, -1 is often a poor choice, because it means your >>> bulk indexing threads are sometimes tied up flushing segments when with >>> refreshing enabled, it's a separate thread that does that. So a refresh of >>> 5s is maybe a good choice. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> >>> On Wed, Jul 16, 2014 at 6:51 AM, Marek Dabrowski <[email protected]> >>> wrote: >>> >>>> Hello >>>> >>>> My configuration is: >>>> 6 nodes Elasticsearch cluster >>>> OS: Centos 6.5 >>>> JVM: 1.7.0_25 >>>> >>>> Cluster is working fine. I can indexing data, query, etc. Now I'm doing >>>> test on package about ~50mln doc (~13GB). I would like take better >>>> performance during indexing data. To take this target I has been changed >>>> parameter refresh_interval. I did test for 1s, -1 and 600s. Time for >>>> indexing data is that same. I checked configuration (_settings) for index >>>> and value for refresh_interval is ok (has proper value), eg: >>>> >>>> { >>>> "smt_20140501_100000_20g_norefresh" : { >>>> "settings" : { >>>> "index" : { >>>> "uuid" : "q3imiZGQTDasQUuMWS8oiw", >>>> "number_of_replicas" : "1", >>>> "number_of_shards" : "6", >>>> "refresh_interval" : "600s", >>>> "version" : { >>>> "created" : "1020199" >>>> } >>>> } >>>> } >>>> } >>>> } >>>> >>>> >>>> >>>> Create index, setting refresh_interval and load is done on that same >>>> cluster node. Before test index is deleted and created again before start >>>> new test with new value of refresh_interval. All cluster nodes logs >>>> information that parameter has been changed, eg: >>>> [2014-07-16 11:24:09,813][INFO ][index.shard.service ] [h6] >>>> [smt_20140501_100000_20g_norefresh][1] updating refresh_interval from >>>> [1s] to [-1] >>>> or >>>> [2014-07-16 11:32:32,928][INFO ][index.shard.service ] [h6] >>>> [smt_20140501_100000_20g_norefresh][1] updating refresh_interval from >>>> [1s] to [10m] >>>> >>>> After start test new data are available immediately and indexing time >>>> that same in 3 cases. I don't know where is failure. Somebody know what is >>>> going on? >>>> >>>> Regards >>>> Marek >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/f7565c36-98c7-4e3e-8132-796f9edfb3fa% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/f7565c36-98c7-4e3e-8132-796f9edfb3fa%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/fb81cccc-d826-46d0-b37f-ca63e74093d2%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/fb81cccc-d826-46d0-b37f-ca63e74093d2%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRc85-8vuZdNzwYP4mbsba7SBDHA2whdGuyaj0%2BLLG__hQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
