This happens when the type of data changes and fails validation at the elastic search end. For example when you send json data a mapping is created for the elastic search index. If to the same index a text string or xml string is submitted it fails validation as per the mapping for the index and flume continuously try to submit the same batch again and again. One option is to disregard or store the failed messages in a separate directory.
On Wed, Oct 8, 2014 at 9:29 PM, Edward Sargisson (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/FLUME-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164105#comment-14164105 > ] > > Edward Sargisson commented on FLUME-2390: > ----------------------------------------- > > [~deepakas1] would you be able to provide an example of the exact data > that reproduces this problem for you? > > I'd like to make sure it's in a test. > > > Flume-ElasticSearch Data gets posted multiple times when one of the > event fail validation at elastic search sink for JSON Data > > > ------------------------------------------------------------------------------------------------------------------------------ > > > > Key: FLUME-2390 > > URL: https://issues.apache.org/jira/browse/FLUME-2390 > > Project: Flume > > Issue Type: Bug > > Components: Sinks+Sources > > Affects Versions: v1.4.0 > > Environment: CDH4.5 > > Reporter: Deepak Subhramanian > > > > Hi, > > I am using Elastic Search Sink to post JSON data. I used the temporary > fix mentioned in https://issues.apache.org/jira/browse/FLUME-2126 to get > JSON data posted to elastic search. When one of the message fail validation > at ElasticSearch mapping for JSON data ( For example - getting empty > message) , Flume seems to post the entire batch again and again until I > restart Flume. Because of that no of events went from an avg of 100 to avg > of 2000 per 10 minutes. As a temporary fix I set a header in my FlumeHTTP > Source for non valid JSON and used a interceptor to send data to multiple > ESSINKS which has different index names. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- Deepak Subhramanian
