I came across this issue when I was migrating our ETL flow from XML to json messages for a previous client. I am not sure if it is a common issue.
The ideal solution will be if the batch fails because of Mapping exception the flume can validate each message in the batch with the mapping of the index and process all the valid messages immediately and write the invalid message to an error file for later investigation. I am not sure how easy it is to validate the message with mapping. This will prevent delay of valid message stuck in dead letter queue if we implement just the dead letter queue solution. On Tue, Apr 21, 2015 at 7:33 PM, Hari Shreedharan (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/FLUME-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505447#comment-14505447 > ] > > Hari Shreedharan commented on FLUME-2390: > ----------------------------------------- > > If you are using the file channel, the file channel integrity tool will allow > you to take a channel offline and validate the data and will remove invalid > data. This is in trunk and is coming in 1.6 > >> Flume-ElasticSearch Data gets posted multiple times when one of the event >> fail validation at elastic search sink for JSON Data >> ------------------------------------------------------------------------------------------------------------------------------ >> >> Key: FLUME-2390 >> URL: https://issues.apache.org/jira/browse/FLUME-2390 >> Project: Flume >> Issue Type: Bug >> Components: Sinks+Sources >> Affects Versions: v1.4.0 >> Environment: CDH4.5 >> Reporter: Deepak Subhramanian >> >> Hi, >> I am using Elastic Search Sink to post JSON data. I used the temporary fix >> mentioned in https://issues.apache.org/jira/browse/FLUME-2126 to get JSON >> data posted to elastic search. When one of the message fail validation at >> ElasticSearch mapping for JSON data ( For example - getting empty message) , >> Flume seems to post the entire batch again and again until I restart Flume. >> Because of that no of events went from an avg of 100 to avg of 2000 per 10 >> minutes. As a temporary fix I set a header in my FlumeHTTP Source for non >> valid JSON and used a interceptor to send data to multiple ESSINKS which has >> different index names. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) -- Deepak Subhramanian
