Hi, ok, my only recomendation is: you should fix your Elasticsearch cluster to be able to handle the load because it seems the shards synchronisation is too slow [0]. how many ES nodes, indices and shards do you have ?
IMHO, the implementation in Heka ES plugin of a retry strategy per document would be quite expensive and surely inefficient. [0] https://discuss.elastic.co/t/elasticsearch-2-2-0-i-am-occasionally-getting-process-cluster-event-timeout-exception-failed-to-process-cluster-event-put-mapping-as-within-30s-while-bulk-indexing-documents/42305/3 2016-04-28 14:54 GMT+02:00 Ramin Ali Dousti <[email protected]>: > Hi, > > The ES version is "2.2.0". > > This is the HTTP response. Look for the status 503 in the payload: > > T 127.0.0.1:9200 -> 127.0.0.1:34497 [AP] > HTTP/1.1 200 OK. > Content-Type: application/json; charset=UTF-8. > Content-Length: 3770. > > { > > - "took": 39911, > - "errors": true, > - "items": [ > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qa", > - "status": 503, > - "error": { > - "type": "process_cluster_event_timeout_exception", > - "reason": "failed to process cluster event (put-mapping > [WAF]) within 30s" > } > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qb", > - "status": 503, > - "error": { > - "type": "process_cluster_event_timeout_exception", > - "reason": "failed to process cluster event (put-mapping > [WAF]) within 30s" > } > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qc", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.03.28", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qd", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.03.28", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qe", > - "status": 503, > - "error": { > - "type": "process_cluster_event_timeout_exception", > - "reason": "failed to process cluster event (put-mapping > [WAF]) within 30s" > } > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.03.28", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qf", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qg", > - "status": 503, > - "error": { > - "type": "process_cluster_event_timeout_exception", > - "reason": "failed to process cluster event (put-mapping > [WAF]) within 30s" > } > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qh", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qi", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qj", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qk", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4ql", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.03.28", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qm", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.03.28", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qn", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.03.28", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qo", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qp", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qq", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qr", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qs", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qt", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > }, > - { > - "create": { > - "_index": "vdps-log-wf-2016.04.26", > - "_type": "WAF", > - "_id": "AVRTguzBSCHabxnyv4qu", > - "_version": 1, > - "_shards": { > - "total": 3, > - "successful": 3, > - "failed": 0 > }, > - "status": 201 > } > } > ] > > } > > > On Wed, Apr 27, 2016 at 3:41 AM, Swann Croiset <[email protected]> wrote: > >> Hi, >> >> IIRC, according to the code you're right, heka doesn't handle such case. >> >> That said, I've never seen a such ES response ... I'm interested about >> it. >> >> Could you share these informations: the ES response, the ES version and >> ES logs (when the thing happens) >> also, what is your configuration on ES side? index template, field >> mapping ? >> >> -- >> Swann >> >> >> >> 2016-04-26 22:28 GMT+02:00 Ramin Ali Dousti <[email protected]>: >> >>> Hi, >>> >>> I have an ES output that bulk uploads to a cluster. The HTTP status code >>> is 200 OK but the reply payload says that it failed the upload for a few of >>> the items. But heka doesn't seem to care about the failed items. I looked >>> at the code and it says: >>> >>> >>> https://github.com/mozilla-services/heka/blob/dev/plugins/elasticsearch/elasticsearch.go#L429 >>> >>> if response != nil { >>> >>> defer response.Body.Close() >>> >>> if response_body, err = ioutil.ReadAll(response.Body); err != >>> nil { >>> >>> return fmt.Errorf("Can't read HTTP response body. >>> Status: %s. Error: %s", >>> >>> response.Status, err.Error()), true >>> >>> } >>> >>> err = json.Unmarshal(response_body, &response_body_json) >>> >>> if err != nil { >>> >>> return fmt.Errorf("HTTP response didn't contain valid >>> JSON. Status: %s. Body: %s", >>> >>> response.Status, string(response_body)), true >>> >>> } >>> >>> json_errors, ok := response_body_json["errors"].(bool) >>> >>> if ok && json_errors && response.StatusCode != 200 { >>> >>> return fmt.Errorf( >>> >>> "ElasticSearch server reported error within >>> JSON. Status: %s. Body: %s", >>> >>> response.Status, string(response_body)), false >>> >>> } >>> >>> if response.StatusCode > 304 { >>> >>> return fmt.Errorf("HTTP response error. Status: %s. >>> Body: %s", response.Status, >>> >>> string(response_body)), false >>> >>> } >>> >>> } >>> >>> >>> 1- In my case I see a 200 OK with "errors = true" which does not seem to >>> be caught, according to the code. >>> 2- I don't see any logic for recovery based on individual items. Am I >>> missing anything here? >>> >>> >>> -- >>> Ramin >>> >>> _______________________________________________ >>> Heka mailing list >>> [email protected] >>> https://mail.mozilla.org/listinfo/heka >>> >>> >> > > > -- > Ramin >
_______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

