[ 
https://issues.apache.org/jira/browse/FLUME-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504578#comment-14504578
 ] 

Benjamin Fiorini commented on FLUME-2390:
-----------------------------------------

Hi [~deepakas1] [~ejsarge]

I believe there are 2 problems here:
# flume elasticsearch sink not indexing with a specific id => this duplicates 
the data, see [~rore] solution
# flume elasticsearch sink not handling mapping discrepancies => this means 
that a bad message will be stuck and fill up your queue... That's cool from the 
Flume point of view but bad for Elasticsearch: there is no easy way to fix this 
on the ES side and you'd need to empty the entire channel because of 1 single 
bad message. Not ideal if you don't want to lose (too much) data.
Maybe a solution is to give the possibility to ignore the 
MapperParsingException. I can provide a patch if this sounds sensible.

Cheers,
Benjamin

> Flume-ElasticSearch Data gets posted multiple times when one of the event 
> fail validation at elastic search sink for JSON Data
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2390
>                 URL: https://issues.apache.org/jira/browse/FLUME-2390
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>         Environment: CDH4.5
>            Reporter: Deepak Subhramanian
>
> Hi,
> I am using Elastic Search Sink to post JSON data. I used the temporary fix 
> mentioned in https://issues.apache.org/jira/browse/FLUME-2126 to get JSON 
> data posted to elastic search. When one of the message fail validation at 
> ElasticSearch mapping for JSON data ( For example - getting empty message) , 
> Flume seems to post the entire batch again and again until I restart Flume.  
> Because of that no of events went from an avg of 100 to avg of 2000 per 10 
> minutes. As a temporary fix I set a header in my FlumeHTTP Source for non 
> valid JSON and used a interceptor to send data to multiple ESSINKS which has 
> different index names. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to