[
https://issues.apache.org/jira/browse/FLUME-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504578#comment-14504578
]
Benjamin Fiorini commented on FLUME-2390:
-----------------------------------------
Hi [~deepakas1] [~ejsarge]
I believe there are 2 problems here:
# flume elasticsearch sink not indexing with a specific id => this duplicates
the data, see [~rore] solution
# flume elasticsearch sink not handling mapping discrepancies => this means
that a bad message will be stuck and fill up your queue... That's cool from the
Flume point of view but bad for Elasticsearch: there is no easy way to fix this
on the ES side and you'd need to empty the entire channel because of 1 single
bad message. Not ideal if you don't want to lose (too much) data.
Maybe a solution is to give the possibility to ignore the
MapperParsingException. I can provide a patch if this sounds sensible.
Cheers,
Benjamin
> Flume-ElasticSearch Data gets posted multiple times when one of the event
> fail validation at elastic search sink for JSON Data
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLUME-2390
> URL: https://issues.apache.org/jira/browse/FLUME-2390
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.4.0
> Environment: CDH4.5
> Reporter: Deepak Subhramanian
>
> Hi,
> I am using Elastic Search Sink to post JSON data. I used the temporary fix
> mentioned in https://issues.apache.org/jira/browse/FLUME-2126 to get JSON
> data posted to elastic search. When one of the message fail validation at
> ElasticSearch mapping for JSON data ( For example - getting empty message) ,
> Flume seems to post the entire batch again and again until I restart Flume.
> Because of that no of events went from an avg of 100 to avg of 2000 per 10
> minutes. As a temporary fix I set a header in my FlumeHTTP Source for non
> valid JSON and used a interceptor to send data to multiple ESSINKS which has
> different index names.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)