GitHub user milleka2 opened a pull request:
https://github.com/apache/flume/pull/28
allow poorly formatted events/data to be dropped
I ran into an issue where some of the raw data going into ElasticSearch was
malformed (fields didn't match the data mapping), which ES rejected as part of
the bulk insert. The Flume ES sink currently handles this by just sending the
record over and over (hoping that maybe ES will just accept it later).
Unfortunately, this creates a LOT of log traffic in ES default log settings AND
it backed up our flume channel, because the data doesn't getting any better by
blindly retrying it.
This patch allows users to choose between 3 options on what to do when bulk
insert errors occur:
1) retry until it somehow magically works (current default within apache
flume)
2) log the error message, then drop it
3) drop it silently.
In our case, we just want to drop it, because losing a few records is worth
it to keep our data flows moving. However, it would be better to have a more
advanced option that can account for times when the ES server is down.
Unfortunately, the ES client API doesn't allow for this flexibility to know the
type of error, so this was the best option available at the time.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/milleka2/flume trunk
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flume/pull/28.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #28
----
commit 0db9862733cdca11d72b428011c17f44ffc4a6d8
Author: Kasey Miller <[email protected]>
Date: 2015-10-15T20:39:38Z
Add bulk error action constants
commit 6dacca1df0c893a2c5f689893bc74bc57745ad07
Author: Kasey Miller <[email protected]>
Date: 2015-10-15T21:09:25Z
Allow config to drop bad records passing through rather than backup the
flume channel
commit bf1190572febdea02ba37b84c1c02a97f899dc77
Author: Kasey Miller <[email protected]>
Date: 2015-10-15T21:09:25Z
Allow config to drop bad records passing through rather than backup the
flume channel
update tests for the new change
commit 23a4f129c8ea9cb80585280ffbd04426bfba0d54
Author: Kasey Miller <[email protected]>
Date: 2015-11-22T19:26:17Z
Merge branch 'trunk' of https://github.com/milleka2/flume into trunk
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---