Hi Simon, The wiki page is dated to say the least. At the moment there are many active deployments of Flume NG that are in staging if not production. I encourage you to look at the performance numbers that were recently published on the wiki [1].
The usecase you have described seems something that Flume should be able to handle very easily. I encourage you to look at the log4j appender, Memory/File channels and the HDFS event sink. Of course you could plan on using other components as well if this does not fit well with your application. [1] https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements Thanks, Arvind Prabhakar On Fri, May 18, 2012 at 4:58 AM, Simon Kelly <[email protected]> wrote: > Hi > > I'm interested in using Flume to store audit logs in HDFS which can then > be queried with Hive. I see that the links on the Flume page point to Flume > NG which says its not ready for production use yet. Is that still the case? > > Our use case would likely look something like this: > > - 15 servers running a Java web server and logging audit data (1-2K > per event, 20-90 events per second per server) > - Hadoop running on 5 machine cluster (4x2.4GHz processors, 8GB RAM, > 8TB total storage) > > Its important that all data makes it into HDFS. > > I'd appreciate any comments on how to proceed with this. > > Best regards > Simon Kelly >
