Good afternoon. 

The subject might be a little misleading as to the true nature of my 
problem - which I'll try to explain here in as much detail as possible.

First of all , I am rather new to Elasticsearch.

Secondly , this problem has happened more than once (after dumping all 
indexes and starting over).

Ok ,here goes:

I have a single elasticsearch node running on a dedicated Dell PowerEdge 
R720 with dual 6 core cpus and 96GB ram. (of which 32GB is assigned to the 
HEAP)
The machine is connected back to back with a 10GB fiber to another Dell 
R620.

My processing happens on the R620 which then uses the bulk api to index 
(currently testing at 8000 documents per second) into the ES on the R720.
The documents are all of the same type and indexed into daily "partitions" 
.. events-2014-10-18 , events-2014-10-19 , events-2014-10-20 etc. 
I then have a Kibana dashboard on top of that.

All of this works perfectly for several days and then seemingly stops. I 
noticed that my Kibana dashboard (defaulting to the last hours data) 
stopped plotting. Further investigation showed that my application is still 
processing and indexing documents, 
but a search on todays index , with descending timestamp ordering shows the 
last document as 3:59 this morning.

POST /events-2014-10-20/event/_search
{
    "size":20
,"from": 0
    ,"query": 
    {
        "filtered": 
        {
"query": 
{
"match_all": {}
}
        }
    }
    ,"sort": 
    [
        {
          "timestamp": 
          {
             "order": "desc"
          }
       }
    ]
}

.... result....
{
   "took": 22836,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 385153635,
      "max_score": null,
      "hits": [
         {
            "_index": "events-2014-10-20",
            "_type": "event",
            "_id": "HYOy53Q3TM-TwBz6SLRB5w",
            "_score": null,
            "_source": {
               "timestamp": "2014-10-20 03:59:14", 
.. etc

Now, when I do a bulk index , the API tells me that the documents were 
created and specifies the autogenerated ids.
So , taking one of those IDs and querying it directly , I DO get the 
document back. 
In the correct index , with the correct timestamp at 10:12 (from earlier 
when I was investigating)

GET /events-2014-10-20/event/kY6MaTgCThizVGgb3iSGrw

{
   "_index": "events-2014-10-20",
   "_type": "event",
   "_id": "kY6MaTgCThizVGgb3iSGrw",
   "_version": 1,
   "found": true,
   "_source": {
      "timestamp": "2014-10-20 10:12:35",

------------------------
Some system stats :
------------------------
The machine's cpu spikes every now and then (as I issue bulk indexes) , but 
drops down to idle again afterwards. 
There's plenty left on the heap and the disks is only 33% used.


curl -XGET 'localhost:9200/_cat/health?v'
epoch      timestamp cluster       status node.total node.data shards pri 
relo init unassign 
1413812685 15:44:45  elasticsearch yellow          1         1     85  85   
 0    0       15 

curl -XGET 'localhost:9200/_cat/count?v'
epoch      timestamp count      
1413812740 15:45:40  4171812520

curl -XGET '10.15.3.19:9200/_cat/indices/events*?v'
health index             pri rep docs.count docs.deleted store.size 
pri.store.size 
green  events-2014-10-19   5   0  702422696            0    230.5gb       
 230.5gb 
green  events-2014-10-17   5   0  702341395            0    230.4gb       
 230.4gb 
green  events-2014-10-18   5   0  702352975            0    230.3gb       
 230.3gb 
green  events-2014-10-15   5   0  703229636            0    230.8gb       
 230.8gb 
green  events-2014-10-16   5   0  701547992            0    230.2gb       
 230.2gb 
green  events-2014-10-14   5   0  252394923            0     83.1gb         
83.1gb 
yellow events-2014-10-20   5   1  407331075            0    127.9gb       
 127.9gb 

curl -XGET 'localhost:9200/_cat/pending_tasks?v'
insertOrder timeInQueue priority source

curl -XGET '10.15.3.19:9200/_cat/shards/events-2014-10-20?v'
index             shard prirep state          docs  store ip         node  
events-2014-10-20 4     p      STARTED    81862132 27.8gb 10.15.3.19 X-Ray 
events-2014-10-20 4     r      UNASSIGNED                                  
events-2014-10-20 0     p      STARTED    81869999 25.4gb 10.15.3.19 X-Ray 
events-2014-10-20 0     r      UNASSIGNED                                  
events-2014-10-20 3     p      STARTED    81885822 25.5gb 10.15.3.19 X-Ray 
events-2014-10-20 3     r      UNASSIGNED                                  
events-2014-10-20 1     p      STARTED    81868103 25.5gb 10.15.3.19 X-Ray 
events-2014-10-20 1     r      UNASSIGNED                                  
events-2014-10-20 2     p      STARTED    81871297 26.5gb 10.15.3.19 X-Ray 
events-2014-10-20 2     r      UNASSIGNED                                 

I can obviously not put this system into production and my deadline is fast 
approaching - so , any help will be greatly appreciated.

Regards.
Pieter

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0895574-c383-4a31-b4cb-620d0210e459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to