You also said you wouldn't recommend indexing that much information at once. How would you suggest breaking it up and what status should I look for before doing another batch? I have to come up with some process that is repeatable and mostly automated.
On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote: > > Thanks for the reply, Vineeth! > > What's a practical heap size? I've seen some people saying they set it to > 30gb but this confuses me because in the /etc/default/elasticsearch file, > the comment suggests the max is only 1gb? > > I'll look into the threadpool issue. Is there a Java API for monitoring > Cluster Node health? Can you point me at an example or give me a link to > that? > > Thanks! > > On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote: >> >> Hello Joshuva , >> >> I have a feeling this has something to do with the threadpool. >> There is a limit on number of feeds to be queued for indexing. >> >> Try increasing the size of threadpool queue of index and bulk to a large >> number. >> Also through cluster node API on threadpool, you can see if any request >> has failed. >> Monitor this API for any failed request due to large volume. >> >> Threadpool - >> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html >> Threadpool stats - >> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html >> >> Having said that , i wont recommend bulk indexing that much information >> at a time and 512 MB is not going to help much. >> >> Thanks >> Vineeth >> >> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <[email protected]> wrote: >> >>> Hi there! >>> >>> I'm trying to do a one-time index of about 800,000 records into an >>> instance of elasticsearch. But I'm having a bit of trouble. It continually >>> fails around 200,000 records. Looking at in the Elasticsearch Head Plugin, >>> my index goes offline and becomes unrecoverable. >>> >>> For now, I have it running on a VM on my personal machine. >>> >>> VM Config: >>> Ubuntu Server 14.04 64-Bit >>> 8 GB RAM >>> 2 Processors >>> 32 GB SSD >>> >>> Java >>> java version "1.7.0_65" >>> OpenJDK Runtime Environment (IcedTea 2.5.1) >>> (7u65-2.5.1-4ubuntu1~0.14.04.2) >>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode) >>> >>> Elasticsearch is using mostly the defaults. This is the output of: >>> curl http://localhost:9200/_nodes/process?pretty >>> { >>> "cluster_name" : "property_transaction_data", >>> "nodes" : { >>> "KlFkO_qgSOKmV_jjj5xeVw" : { >>> "name" : "Marvin Flumm", >>> "transport_address" : "inet[/192.168.133.131:9300]", >>> "host" : "ubuntu-es", >>> "ip" : "127.0.1.1", >>> "version" : "1.3.2", >>> "build" : "dee175d", >>> "http_address" : "inet[/192.168.133.131:9200]", >>> "process" : { >>> "refresh_interval_in_millis" : 1000, >>> "id" : 1092, >>> "max_file_descriptors" : 65535, >>> "mlockall" : true >>> } >>> } >>> } >>> } >>> >>> I adjusted ES_HEAP_SIZE to 512mb. >>> >>> I'm using the following code to pull data from SQL Server and index it. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
