Hi Jörg, Can you elaborate on what you mean by I still need more fine tuning?
I've upped the heap size to 4g (in both places I mentioned before because it's not clear to me which one ES actually uses). I haven't tried to index again yet. Other than throttling my indexing, what are some other things I need to be thinking about? On Tuesday, September 9, 2014 12:53:35 PM UTC-4, Jörg Prante wrote: > > Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and > indexing around 1 million docs, you need some more fine tuning, which is > complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8 > GB RAM. > > Jörg > > On Tue, Sep 9, 2014 at 5:39 PM, Joshua P <[email protected] > <javascript:>> wrote: > >> Here is /etc/default/elasticsearch >> >> # Run Elasticsearch as this user ID and group ID >> #ES_USER=elasticsearch >> #ES_GROUP=elasticsearch >> >> # Heap Size (defaults to 256m min, 1g max) >> ES_HEAP_SIZE=512m >> >> # Heap new generation >> #ES_HEAP_NEWSIZE= >> >> # max direct memory >> #ES_DIRECT_SIZE= >> >> # Maximum number of open files, defaults to 65535. >> MAX_OPEN_FILES=65535 >> >> # Maximum locked memory size. Set to "unlimited" if you use the >> # bootstrap.mlockall option in elasticsearch.yml. You must also set >> # ES_HEAP_SIZE. >> MAX_LOCKED_MEMORY=unlimited >> >> # Maximum number of VMA (Virtual Memory Areas) a process can own >> #MAX_MAP_COUNT=262144 >> >> # Elasticsearch log directory >> #LOG_DIR=/var/log/elasticsearch >> >> # Elasticsearch data directory >> #DATA_DIR=/var/lib/elasticsearch >> >> # Elasticsearch work directory >> #WORK_DIR=/tmp/elasticsearch >> >> # Elasticsearch configuration directory >> #CONF_DIR=/etc/elasticsearch >> >> # Elasticsearch configuration file (elasticsearch.yml) >> #CONF_FILE=/etc/elasticsearch/elasticsearch.yml >> >> # Additional Java OPTS >> #ES_JAVA_OPTS= >> >> # Configure restart on package upgrade (true, every other setting will >> lead to not restarting) >> #RESTART_ON_UPGRADE=true >> >> I also see the same setting in /etc/init.d/elasticsearch. Do you know >> which file takes priority? And what a good size would be? >> >> On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote: >>> >>> Hello Joshua , >>> >>> I am not sure which variable you are referring to on the memory settings >>> in the config file , please paste the comment and config. >>> I usually change the config from init.d script. >>> >>> Best approach would be to bulk index say 10,000 feeds in sync mode , >>> wait until is everything is indexed and then proceed to the next batch. >>> I am not sure about the java API , but long back i used to curl to this >>> stats API and see how much request was rejected. >>> >>> Thanks >>> Vineeth >>> >>> On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <[email protected]> wrote: >>> >>>> You also said you wouldn't recommend indexing that much information at >>>> once. How would you suggest breaking it up and what status should I look >>>> for before doing another batch? I have to come up with some process that >>>> is >>>> repeatable and mostly automated. >>>> >>>> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote: >>>>> >>>>> Thanks for the reply, Vineeth! >>>>> >>>>> What's a practical heap size? I've seen some people saying they set it >>>>> to 30gb but this confuses me because in the /etc/default/elasticsearch >>>>> file, the comment suggests the max is only 1gb? >>>>> >>>>> I'll look into the threadpool issue. Is there a Java API for >>>>> monitoring Cluster Node health? Can you point me at an example or give me >>>>> a >>>>> link to that? >>>>> >>>>> Thanks! >>>>> >>>>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote: >>>>>> >>>>>> Hello Joshuva , >>>>>> >>>>>> I have a feeling this has something to do with the threadpool. >>>>>> There is a limit on number of feeds to be queued for indexing. >>>>>> >>>>>> Try increasing the size of threadpool queue of index and bulk to a >>>>>> large number. >>>>>> Also through cluster node API on threadpool, you can see if any >>>>>> request has failed. >>>>>> Monitor this API for any failed request due to large volume. >>>>>> >>>>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/ >>>>>> reference/current/modules-threadpool.html >>>>>> Threadpool stats - http://www.elasticsearch.org >>>>>> /guide/en/elasticsearch/reference/current/cluster-nodes-stats.html >>>>>> >>>>>> Having said that , i wont recommend bulk indexing that much >>>>>> information at a time and 512 MB is not going to help much. >>>>>> >>>>>> Thanks >>>>>> Vineeth >>>>>> >>>>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <[email protected]> wrote: >>>>>> >>>>>>> Hi there! >>>>>>> >>>>>>> I'm trying to do a one-time index of about 800,000 records into an >>>>>>> instance of elasticsearch. But I'm having a bit of trouble. It >>>>>>> continually >>>>>>> fails around 200,000 records. Looking at in the Elasticsearch Head >>>>>>> Plugin, >>>>>>> my index goes offline and becomes unrecoverable. >>>>>>> >>>>>>> For now, I have it running on a VM on my personal machine. >>>>>>> >>>>>>> VM Config: >>>>>>> Ubuntu Server 14.04 64-Bit >>>>>>> 8 GB RAM >>>>>>> 2 Processors >>>>>>> 32 GB SSD >>>>>>> >>>>>>> Java >>>>>>> java version "1.7.0_65" >>>>>>> OpenJDK Runtime Environment (IcedTea 2.5.1) >>>>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2) >>>>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode) >>>>>>> >>>>>>> Elasticsearch is using mostly the defaults. This is the output of: >>>>>>> curl http://localhost:9200/_nodes/process?pretty >>>>>>> { >>>>>>> "cluster_name" : "property_transaction_data", >>>>>>> "nodes" : { >>>>>>> "KlFkO_qgSOKmV_jjj5xeVw" : { >>>>>>> "name" : "Marvin Flumm", >>>>>>> "transport_address" : "inet[/192.168.133.131:9300]", >>>>>>> "host" : "ubuntu-es", >>>>>>> "ip" : "127.0.1.1", >>>>>>> "version" : "1.3.2", >>>>>>> "build" : "dee175d", >>>>>>> "http_address" : "inet[/192.168.133.131:9200]", >>>>>>> "process" : { >>>>>>> "refresh_interval_in_millis" : 1000, >>>>>>> "id" : 1092, >>>>>>> "max_file_descriptors" : 65535, >>>>>>> "mlockall" : true >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> I adjusted ES_HEAP_SIZE to 512mb. >>>>>>> >>>>>>> I'm using the following code to pull data from SQL Server and index >>>>>>> it. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3 >>>>>>> f-462f-bdcf-df717cbc6269%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a3680944-54fc-4d01-bb30-3a9465760cae%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
