I believe sir there should be a flume support group on cloudera. I'm guessing most of us here haven't used it and therefore aren't much help.
This is vanilla hadoop land. :) Cheers and good luck! James On a side note, how much data are you pumping through it? Sent from my mobile. Please excuse the typos. On 2011-03-16, at 7:53 PM, Mark <[email protected]> wrote: > Sorry if this is not the correct list to post this on, it was the closest I > could find. > > We are using a taildir('/var/log/foo/') source on all of our agents. If this > agent goes down and data can not be sent to the collector for some time, what > happens when this agent becomes available again? Will the agent tail the > whole directory starting from the beginning of all files thus adding > duplicate data to our sink? > > I've read that I could set the startFromEnd parameter to true. In that case > however if an agent goes down then we would lose any data that gets written > to our file until the agent comes back up. How do people handle this? It > seems like you either have to deal with the fact that you will have duplicate > or missing data. > > Thanks||
