I believe sir there should be a flume support group on cloudera. I'm
guessing most of us here haven't used it and therefore aren't  much
help.

This is vanilla hadoop land. :)

Cheers and good luck!
James

On a side note, how much data are you pumping through it?


Sent from my mobile. Please excuse the typos.

On 2011-03-16, at 7:53 PM, Mark <[email protected]> wrote:

> Sorry if this is not the correct list to post this on, it was the closest I 
> could find.
>
> We are using a taildir('/var/log/foo/') source on all of our agents. If this 
> agent goes down and data can not be sent to the collector for some time, what 
> happens when this agent becomes available again? Will the agent tail the 
> whole directory starting from the beginning of all files thus adding 
> duplicate data to our sink?
>
> I've read that I could set the startFromEnd parameter to true. In that case 
> however if an agent goes down then we would lose any data that gets written 
> to our file until the agent comes back up. How do people handle this? It 
> seems like you either have to deal with the fact that you will have duplicate 
> or missing data.
>
> Thanks||

Reply via email to