We wrote some custom tools that poll for new data and launch jobs periodically.
Matt On Tue, 2008-06-24 at 09:27 -0700, Vadim Zaliva wrote: > Matt, > > How do you manage your tasks? Do you lauch them periodically or keep > them somehow running and feed them data? > > Vadim > > > On Mon, Jun 23, 2008 at 21:54, Matt Kent <[EMAIL PROTECTED]> wrote: > > We use Hadoop in a similar manner, to process batches of data in > > real-time every few minutes. However, we do substantial amounts of > > processing on that data, so we use Hadoop to distribute our computation. > > Unless you have a significant amount of work to be done, I wouldn't > > recommend using Hadoop because it's not worth the overhead of launching > > the jobs and moving the data around. > > > > Matt > > > > On Tue, 2008-06-24 at 13:34 +1000, Ian Holsman (Lists) wrote: > >> Interesting. > >> we are planning on using hadoop to provide 'near' real time log > >> analysis. we plan on having files close every 5 minutes (1 per log > >> machine, so 80 files every 5 minutes) and then have a m/r to merge it > >> into a single file that will get processed by other jobs later on. > >> > >> do you think this will namespace will explode? > >> > >> I wasn't thinking of clouddb.. it might be an interesting alternative > >> once it is a bit more stable. > >> > >> regards > >> Ian > >> > >> Stefan Groschupf wrote: > >> > Hadoop might be the wrong technology for you. > >> > Map Reduce is a batch processing mechanism. Also HDFS might be critical > >> > since to access your data you need to close the file - means you might > >> > have many small file, a situation where hdfs is not very strong > >> > (namespace is hold in memory). > >> > Hbase might be an interesting tool for you, also zookeeper if you want > >> > to do something home grown... > >> > > >> > > >> > > >> > On Jun 23, 2008, at 11:31 PM, Vadim Zaliva wrote: > >> > > >> >> Hi! > >> >> > >> >> I am considering using Hadoop for (almost) realime data processing. I > >> >> have data coming every second and I would like to use hadoop cluster > >> >> to process > >> >> it as fast as possible. I need to be able to maintain some guaranteed > >> >> max. processing time, for example under 3 minutes. > >> >> > >> >> Does anybody have experience with using Hadoop in such manner? I will > >> >> appreciate if you can share your experience or give me pointers > >> >> to some articles or pages on the subject. > >> >> > >> >> Vadim > >> >> > >> > > >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> > 101tec Inc. > >> > Menlo Park, California, USA > >> > http://www.101tec.com > >> > > >> > > >> > > > >