Yes, I agree fully. The tailing is a useful mechanism, but since we also have to deliver in time and reliable the core team decides to remove that feature. In your case tail make sense, in a session application not (bank, travel, car rental, pizza service and so on). One missing token or session can harm.
For flumeNG another sink is implemented, called exec-agent. Here you can easy put a tail sink, but then you have to consider that all runs well. But for new users I would please point them to flumeNG, because flume and flumeNG has no compatibility, flumeNG is written completely new. I think when flumeNG release the next milestone the support for flume will slowly going down. best, and thanks for the discussion, Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 7, 2012, at 1:16 PM, Michal Taborsky wrote: > Hi Alex, > > truth be told, I am quite satisfied with the file tailing and I'll try to > explain why I like it. The main reason is, at least for us, the web > application itself is business critical, the event collection is not. Writing > to a plain file is a thing that can rarely fail and if it fails, it fails > quickly and in a controlled fashion. But piping to a flume agent for example? > How sure can I be, that the write will work all the time or fail immediately? > That it will not wait for some timeout or the other? Or throw some unexpected > error and bring down the app. > > The other aspect is simple development and debugging. Any developer can read > a plain file and check if the data he's writing is correct, but with any > sophisticated method you either need more complicated testing environment or > redirection switches that will write to files in development and to flume in > testing and production, which complicates stuff. > > -- > Michal Táborský > chief systems architect > Netretail Holding, BV > nrholding.com > > > > > 2012/2/7 alo alt <[email protected]> > Hi, > > sorry for pitching in, but FlumeNG will not support tailing sources, because > we have here a lot of problems. First, and mostly the worst problem is the > marker in a tailing file. If the agent crash, or the server, or the collector > the marker will be lost. So, if you restart you'll getting all events again. > Sure, you can use append, but here you get lost events. > > For a easy migration from flume to flumeNG use sources which are supported in > NG. Syslog as example, more sources you can found here: > https://cwiki.apache.org/FLUME/flume-ng.html > > You could use Avro for the sessions, and you could pipe direct to a local > flume agent. Also syslog with a buffering mode could be work. Also in flumeNG > now we have hBase handler and thrift. > Another idea for collecting sessions could be > http://hadoop.apache.org/common/docs/r1.0.0/webhdfs.html , an REST api for > hdfs? > > - Alex > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > On Feb 7, 2012, at 11:14 AM, Alain RODRIGUEZ wrote: > > > Thank you for your answer it helps me a lot knowing I'am doing things in a > > good way. > > > > I've got an other question: How do you manage restart the service after a > > crash ? I mean, you tail the log file, so if your server crashes or you > > stop the tail for any reason, how do you do not to tail all the logs from > > the start, how do you manage restarting from the exact point where you left > > your tail process ? > > > > Thanks again for your help, I really appreciate :-). > > > > Alain > > > > 2012/2/2 Michal Taborsky <[email protected]> > > Hello Alain, > > > > we are using Flume for probably the same purposes. We are writing JSON > > encoded event data to flat file on every application server. Since each > > application server writes only maybe tens of events per second, the > > performance hit of writing to disk is negligible (and the events are > > written to disk only after the content is generated and sent to the user, > > so there is no latency for the end user). This file is tailed by Flume and > > delivered thru collectors to HDFS. The collectors are forking the events to > > RabbitMQ as well. We have a Node.js application, that picks up these events > > and does some real-time analytics on them. The delay between event > > origination and analytics is below 10 seconds, usually 1-3 seconds in total. > > > > Hope this helps. > > > > -- > > Michal Táborský > > chief systems architect > > Netretail Holding, BV > > nrholding.com > > > > > > > > > > 2012/2/2 Alain RODRIGUEZ <[email protected]> > > Hi, > > > > I'm new with Flume and I'd like to use it to get a stable flow of data to > > my database (To be able to handle rush hours by delaying the write in > > database, without introducing any timeout or latency to the user). > > > > My questions are : > > > > What is the best way to create the log file that will be used as source for > > flume ? > > > > Our production environment is running apache servers and php scripts. > > I can't just use access log because some informations are stored in > > session, so I need to build a custom source. > > An other point is that writing a file seems to be primitive and not really > > efficient since it writes the disk instead of writing the memory for any > > event I store (many events every second). > > > > How to use this system (as Facebook does with scribe) to proceed real-time > > analytics ? > > > > I'm open to here about hdfs, hbase or whatever could help reaching my goals > > which are a stable flow to the database and near real-time analytics > > (seconds to minutes). > > > > Thanks for your help. > > > > Alain > > > > > >
