Re: Bulk Indexing Problems

Joshua P Tue, 09 Sep 2014 08:40:25 -0700

Here is /etc/default/elasticsearch

# Run Elasticsearch as this user ID and group ID
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch


# Heap Size (defaults to 256m min, 1g max)
ES_HEAP_SIZE=512m

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Maximum number of open files, defaults to 65535.
MAX_OPEN_FILES=65535

# Maximum locked memory size. Set to "unlimited" if you use the
# bootstrap.mlockall option in elasticsearch.yml. You must also set
# ES_HEAP_SIZE.
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
#MAX_MAP_COUNT=262144

# Elasticsearch log directory
#LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
#DATA_DIR=/var/lib/elasticsearch

# Elasticsearch work directory
#WORK_DIR=/tmp/elasticsearch

# Elasticsearch configuration directory
#CONF_DIR=/etc/elasticsearch

# Elasticsearch configuration file (elasticsearch.yml)
#CONF_FILE=/etc/elasticsearch/elasticsearch.yml

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead 
to not restarting)
#RESTART_ON_UPGRADE=true

I also see the same setting in /etc/init.d/elasticsearch. Do you know which 
file takes priority? And what a good size would be? 

On Tuesday, September 9, 2014 11:32:19 AM UTC-4, vineeth mohan wrote:
>
> Hello Joshua , 
>
> I am not sure which variable you are referring to on the memory settings 
> in the config file , please paste the comment and config.
> I usually change the config from init.d script.
>
> Best approach would be to bulk index say 10,000 feeds in sync mode , wait 
> until is everything is indexed and then proceed to the next batch.
> I am not sure about the java API , but long back i used to curl to this 
> stats API and see how much request was rejected.
>
> Thanks
>           Vineeth
>
> On Tue, Sep 9, 2014 at 8:58 PM, Joshua P <[email protected] 
> <javascript:>> wrote:
>
>> You also said you wouldn't recommend indexing that much information at 
>> once. How would you suggest breaking it up and what status should I look 
>> for before doing another batch? I have to come up with some process that is 
>> repeatable and mostly automated. 
>>
>> On Tuesday, September 9, 2014 11:12:59 AM UTC-4, Joshua P wrote:
>>>
>>> Thanks for the reply, Vineeth! 
>>>
>>> What's a practical heap size? I've seen some people saying they set it 
>>> to 30gb but this confuses me because in the /etc/default/elasticsearch 
>>> file, the comment suggests the max is only 1gb? 
>>>
>>> I'll look into the threadpool issue. Is there a Java API for monitoring 
>>> Cluster Node health? Can you point me at an example or give me a link to 
>>> that? 
>>>
>>> Thanks! 
>>>
>>> On Tuesday, September 9, 2014 10:52:35 AM UTC-4, vineeth mohan wrote:
>>>>
>>>> Hello Joshuva ,
>>>>
>>>> I have a feeling this has something to do with the threadpool.
>>>> There is a limit on number of feeds to be queued for indexing.
>>>>
>>>> Try increasing the size of threadpool queue of index and bulk to a 
>>>> large number.
>>>> Also through cluster node API on threadpool, you can see if any request 
>>>> has failed.
>>>> Monitor this API for any failed request due to large volume.
>>>>
>>>> Threadpool - http://www.elasticsearch.org/guide/en/elasticsearch/
>>>> reference/current/modules-threadpool.html
>>>> Threadpool stats - http://www.elasticsearch.org/guide/en/elasticsearch/
>>>> reference/current/cluster-nodes-stats.html
>>>>
>>>> Having said that , i wont recommend bulk indexing that much information 
>>>> at a time and 512 MB is not going to help much.
>>>>
>>>> Thanks
>>>>           Vineeth
>>>>
>>>> On Tue, Sep 9, 2014 at 7:48 PM, Joshua P <[email protected]> wrote:
>>>>
>>>>> Hi there! 
>>>>>
>>>>> I'm trying to do a one-time index of about 800,000 records into an 
>>>>> instance of elasticsearch. But I'm having a bit of trouble. It 
>>>>> continually 
>>>>> fails around 200,000 records. Looking at in the Elasticsearch Head 
>>>>> Plugin, 
>>>>> my index goes offline and becomes unrecoverable. 
>>>>>
>>>>> For now, I have it running on a VM on my personal machine. 
>>>>>
>>>>> VM Config: 
>>>>> Ubuntu Server 14.04 64-Bit
>>>>> 8 GB RAM
>>>>> 2 Processors
>>>>> 32 GB SSD
>>>>>
>>>>> Java
>>>>> java version "1.7.0_65"
>>>>> OpenJDK Runtime Environment (IcedTea 2.5.1) 
>>>>> (7u65-2.5.1-4ubuntu1~0.14.04.2)
>>>>> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
>>>>>
>>>>> Elasticsearch is using mostly the defaults. This is the output of: 
>>>>> curl http://localhost:9200/_nodes/process?pretty
>>>>> {
>>>>>   "cluster_name" : "property_transaction_data",
>>>>>   "nodes" : {
>>>>>     "KlFkO_qgSOKmV_jjj5xeVw" : {
>>>>>       "name" : "Marvin Flumm",
>>>>>       "transport_address" : "inet[/192.168.133.131:9300]",
>>>>>       "host" : "ubuntu-es",
>>>>>       "ip" : "127.0.1.1",
>>>>>       "version" : "1.3.2",
>>>>>       "build" : "dee175d",
>>>>>       "http_address" : "inet[/192.168.133.131:9200]",
>>>>>       "process" : {
>>>>>         "refresh_interval_in_millis" : 1000,
>>>>>         "id" : 1092,
>>>>>         "max_file_descriptors" : 65535,
>>>>>         "mlockall" : true
>>>>>       }
>>>>>     }
>>>>>   }
>>>>> }
>>>>>
>>>>> I adjusted ES_HEAP_SIZE to 512mb. 
>>>>>
>>>>> I'm using the following code to pull data from SQL Server and index 
>>>>> it. 
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/f94f96d4-8c3f-462f-bdcf-df717cbc6269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/0dcac495-a071-4644-9349-109071fb1855%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b439af3d-69b0-4301-bf07-22b37767a17c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk Indexing Problems

Reply via email to